View article

[PDF] from jair.org

Hierarchical reinforcement learning with the MAXQ value function decomposition

Authors

Thomas G Dietterich

Publication date

2000/11/1

Journal

Journal of artificial intelligence research

Volume

Pages

227-303

Description

This paper presents a new approach to hierarchical reinforcement learning based on decomposing the target Markov decision process (MDP) into a hierarchy of smaller MDPs and decomposing the value function of the target MDP into an additive combination of the value functions of the smaller MDPs. The decomposition, known as the MAXQ decomposition, has both a procedural semantics---as a subroutine hierarchy---and a declarative semantics---as a representation of the value function of a hierarchical policy. MAXQ unifies and extends previous work on hierarchical reinforcement learning by Singh, Kaelbling, and Dayan and Hinton. It is based on the assumption that the programmer can identify useful subgoals and define subtasks that achieve these subgoals. By defining such subgoals, the programmer constrains the set of policies that need to be considered during reinforcement learning. The MAXQ value function decomposition can represent the value function of any policy that is consistent with the given hierarchy. The decomposition also creates opportunities to exploit state abstractions, so that individual MDPs within the hierarchy can ignore large parts of the state space. This is important for the practical application of the method. This paper defines the MAXQ hierarchy, proves formal results on its representational power, and establishes five conditions for the safe use of state abstractions. The paper presents an online model-free learning algorithm, MAXQ-Q, and proves that it converges with probability 1 to a kind of locally-optimal policy known as a recursively optimal policy, even in the presence of the five kinds of state abstraction …

Total citations

Cited by 2126

1999200020012002200320042005200620072008200920102011201220132014201520162017201820192020202120222023202412 10 17 35 43 56 81 83 93 79 65 79 79 69 63 61 70 92 88 123 138 121 154 162 141 75

Scholar articles

Hierarchical reinforcement learning with the MAXQ value function decomposition

TG Dietterich - Journal of artificial intelligence research, 2000

Cited by 2126 Related articles All 29 versions