Authors
Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, Dominik Wojtczak
Publication date
2019/4/4
Book
International conference on tools and algorithms for the construction and analysis of systems
Pages
395-412
Publisher
Springer International Publishing
Description
We provide the first solution for model-free reinforcement learning of -regular objectives for Markov decision processes (MDPs). We present a constructive reduction from the almost-sure satisfaction of -regular objectives to an almost-sure reachability problem, and extend this technique to learning how to control an unknown model so that the chance of satisfying the objective is maximized. We compile -regular properties into limit-deterministic Büchi automata instead of the traditional Rabin automata; this choice sidesteps difficulties that have marred previous proposals. Our approach allows us to apply model-free, off-the-shelf reinforcement learning algorithms to compute optimal strategies from the observations of the MDP. We present an experimental evaluation of our technique on benchmark learning problems.
Scholar articles
EM Hahn, M Perez, S Schewe, F Somenzi, A Trivedi… - International conference on tools and algorithms for the …, 2019