View article

[PDF] from aaai.org

Dynamic regret of adversarial MDPs with unknown transition and linear function approximation

Authors

Long-Fei Li, Peng Zhao, Zhi-Hua Zhou

Publication date

2024/3/24

Journal

Proceedings of the AAAI Conference on Artificial Intelligence

Volume

Issue

Pages

13572-13580

Description

We study reinforcement learning (RL) in episodic MDPs with adversarial full-information losses and the unknown transition. Instead of the classical static regret, we adopt \emph{dynamic regret} as the performance measure which benchmarks the learner's performance with \emph{changing} policies, making it more suitable for non-stationary environments. The primary challenge is to handle the uncertainties of unknown transition and unknown non-stationarity of environments simultaneously. We propose a general framework to decouple the two sources of uncertainties and show the dynamic regret bound naturally decomposes into two terms, one due to constructing confidence sets to handle the unknown transition and the other due to choosing sub-optimal policies under the unknown non-stationarity. To this end, we first employ the two-layer online ensemble structure to handle the adaptation error due to the unknown non-stationarity, which is model-agnostic. Subsequently, we instantiate the framework to three fundamental MDP models, including tabular MDPs, linear MDPs and linear mixture MDPs, and present corresponding approaches to control the exploration error due to the unknown transition. We provide dynamic regret guarantees respectively and show they are optimal in terms of the number of episodes and the non-stationarity by establishing matching lower bounds. To the best of our knowledge, this is the first work that achieves the optimal (w.r.t. and ) dynamic regret \emph{without} prior knowledge about the non-stationarity of environments for adversarial MDPs with the unknown transition.

Total citations

Cited by 1

20241

Scholar articles

Dynamic regret of adversarial MDPs with unknown transition and linear function approximation

LF Li, P Zhao, ZH Zhou - Proceedings of the AAAI Conference on Artificial …, 2024