Authors
Kefan Dong, Yuping Luo, Tianhe Yu, Chelsea Finn, Tengyu Ma
Publication date
2020/11/21
Conference
International conference on machine learning
Pages
2627-2637
Publisher
PMLR
Description
We compare the model-free reinforcement learning with the model-based approaches through the lens of the expressive power of neural networks for policies, Q-functions, and dynamics. We show, theoretically and empirically, that even for one-dimensional continuous state space, there are many MDPs whose optimal Q-functions and policies are much more complex than the dynamics. For these MDPs, model-based planning is a favorable algorithm, because the resulting policies can approximate the optimal policy significantly better than a neural network parameterization can, and model-free or model-based policy optimization rely on policy parameterization. Motivated by the theory, we apply a simple multi-step model-based bootstrapping planner (BOOTS) to bootstrap a weak Q-function into a stronger policy. Empirical results show that applying BOOTS on top of model-based or model-free policy optimization algorithms at the test time improves the performance on benchmark tasks.
Total citations
20202021202220232024181085
Scholar articles
K Dong, Y Luo, T Yu, C Finn, T Ma - International conference on machine learning, 2020