View article

[PDF] from arxiv.org

Heuristic search value iteration for POMDPs

Authors

Trey Smith, Reid Simmons

Publication date

2012/7/11

Journal

arXiv preprint arXiv:1207.4166

Description

We present a novel POMDP planning algorithm called heuristic search value iteration (HSVI).HSVI is an anytime algorithm that returns a policy and a provable bound on its regret with respect to the optimal policy. HSVI gets its power by combining two well-known techniques: attention-focusing search heuristics and piecewise linear convex representations of the value function. HSVI's soundness and convergence have been proven. On some benchmark problems from the literature, HSVI displays speedups of greater than 100 with respect to other state-of-the-art POMDP value iteration algorithms. We also apply HSVI to a new rover exploration problem 10 times larger than most POMDP problems in the literature.

Total citations

Cited by 684

2005200620072008200920102011201220132014201520162017201820192020202120222023202418 21 24 37 22 40 34 42 47 37 35 35 33 39 44 39 41 41 35 18

Scholar articles

Heuristic search value iteration for POMDPs

T Smith, R Simmons - arXiv preprint arXiv:1207.4166, 2012