Authors
Jueming Hu, Zhe Xu, Weichang Wang, Guannan Qu, Yutian Pang, Yongming Liu
Publication date
2024/1/7
Journal
Neurocomputing
Volume
564
Pages
126974
Publisher
Elsevier
Description
In multi-agent reinforcement learning (MARL), it is challenging for a collection of agents to learn complex temporally extended tasks. The difficulties lie in computational complexity and how to learn the high-level ideas behind reward functions. We study the graph-based Markov Decision Process (MDP), where the dynamics of neighboring agents are coupled. To learn complex temporally extended tasks, we use a reward machine (RM) to encode each agent’s task and expose reward function internal structures. RM has the capacity to describe high-level knowledge and encode non-Markovian reward functions. We propose a decentralized learning algorithm to tackle computational complexity, called decentralized graph-based reinforcement learning using reward machines (DGRM), that equips each agent with a localized policy, allowing agents to make decisions independently based on the information available to …
Total citations
2023202443
Scholar articles