Authors
Ashwinkumar Badanidiyuru, Robert Kleinberg, Aleksandrs Slivkins
Publication date
2018/3/1
Journal
Journal of the ACM (JACM)
Volume
65
Issue
3
Pages
13
Publisher
ACM
Description
Multi-armed bandit problems are the predominant theoretical model of exploration-exploitation tradeoffs in learning, and they have countless applications ranging from medical trials, to communication networks, to Web search and advertising. In many of these application domains, the learner may be constrained by one or more supply (or budget) limits, in addition to the customary limitation on the time horizon. The literature lacks a general model encompassing these sorts of problems. We introduce such a model, called bandits with knapsacks, that combines bandit learning with aspects of stochastic integer programming. In particular, a bandit algorithm needs to solve a stochastic version of the well-known knapsack problem, which is concerned with packing items into a limited-size knapsack. A distinctive feature of our problem, in comparison to the existing regret-minimization literature, is that the optimal policy for a …
Total citations
20132014201520162017201820192020202120222023202471724222730495376697752
Scholar articles
A Badanidiyuru, R Kleinberg, A Slivkins - Journal of the ACM (JACM), 2018
A Badanidiyuru, R Kleinberg, A Slivkins - The 3rd Workshop on Social Computing and User …, 2013