View article

[PDF] from mlr.press

Taming the monster: A fast and simple algorithm for contextual bandits

Authors

Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, Robert E. Schapire

Publication date

2014

Conference

Thirty-First International Conference on Machine Learning

Description

We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of K\emphactions in response to the observed\emphcontext, and observes the\emphreward only for that action. Our method assumes access to an oracle for solving fully supervised cost-sensitive classification problems and achieves the statistically optimal regret guarantee with only\otil (\sqrtKT) oracle calls across all T rounds. By doing so, we obtain the most practical contextual bandit learning algorithm amongst approaches that work for general policy classes. We conduct a proof-of-concept experiment which demonstrates the excellent computational and statistical performance of (an online variant of) our algorithm relative to several strong baselines.

Total citations

Cited by 570

201420152016201720182019202020212022202320244 18 34 35 51 65 72 68 77 77 67

Scholar articles

Taming the monster: A fast and simple algorithm for contextual bandits

A Agarwal, D Hsu, S Kale, J Langford, L Li, R Schapire - International conference on machine learning, 2014