View article

[PDF] from psu.edu

Multi-armed bandit algorithms and empirical evaluation

Authors

Joannes Vermorel, Mehryar Mohri

Publication date

2005/10/3

Book

European conference on machine learning

Pages

437-448

Publisher

Springer Berlin Heidelberg

Description

The multi-armed bandit problem for a gambler is to decide which arm of a K-slot machine to pull to maximize his total reward in a series of trials. Many real-world learning and optimization problems can be modeled in this way. Several strategies or algorithms have been proposed as a solution to this problem in the last two decades, but, to our knowledge, there has been no common evaluation of these algorithms.

This paper provides a preliminary empirical evaluation of several multi-armed bandit algorithms. It also describes and analyzes a new algorithm, Poker (Price Of Knowledge and Estimated Reward) whose performance compares favorably to that of other existing algorithms in several experiments. One remarkable outcome of our experiments is that the most naive approach, the ε-greedy strategy, proves to be often hard to beat.

Total citations

Cited by 804

20062007200820092010201120122013201420152016201720182019202020212022202320243 11 15 18 24 18 44 36 47 49 47 40 54 69 62 93 59 71 30

Scholar articles

Multi-armed bandit algorithms and empirical evaluation

J Vermorel, M Mohri - European conference on machine learning, 2005