View article

[PDF] from slivkins.com

Adapting to a Changing Environment: the Brownian Restless Bandits.

Authors

Aleksandrs Slivkins, Eli Upfal

Publication date

2008/7

Conference

COLT

Pages

343-354

Description

In the multi-armed bandit (MAB) problem there are k distributions associated with the rewards of playing each of k strategies (slot machine arms). The reward distributions are initially unknown to the player. The player iteratively plays one strategy per round, observes the associated reward, and decides on the strategy for the next iteration. The goal is to maximize the reward by balancing exploitation: the use of acquired information, with exploration: learning new information.

We introduce and study a dynamic MAB problem in which the reward functions stochastically and gradually change in time. Specifically, the expected reward of each arm follows a Brownian motion, a discrete random walk, or similar processes. In this setting a player has to continuously keep exploring in order to adapt to the changing environment. Our formulation is (roughly) a special case of the notoriously intractable restless MAB problem. Our goal here is to characterize the cost of learning and adapting to the changing environment, in terms of the stochastic rate of the change. We consider an infinite time horizon, and strive to minimize the average cost per step which we define with respect to a hypothetical algorithm that at every step plays the arm with the maximum expected reward at this step. A related line of work on the adversarial MAB problem used a significantly weaker benchmark, the best time-invariant policy.

Total citations

Cited by 169

200820092010201120122013201420152016201720182019202020212022202320241 4 2 5 9 4 5 6 7 14 13 16 17 14 22 18 11

Scholar articles

Adapting to a Changing Environment: the Brownian Restless Bandits.

A Slivkins, E Upfal - COLT, 2008

Adapting to a stochastically changing environment: The dynamic multi-armed bandits problem*

A Slivkins, E Upfal - 2007