View article

[PDF] from uni-bielefeld.de

A survey of POMDP solution techniques

Authors

Kevin P Murphy

Publication date

2000/9/9

Journal

environment

Volume

Issue

Description

One of the goals of AI is to design an agent1 which can interact with an environment so as to maximize some reward function. Control theory addresses the same problem, but uses slightly different language: agent= controller, environment= plant, maximizing reward= minimizing cost. Control theory is mainly concerned with tasks in continuous spaces, such as designing a guided missile to intercept an airplane in minimum expected time, whereas AI is mainly concerned with tasks in discrete spaces, such as designing a program to play bridge to maximize the chance of winning. Nevertheless, AI and control theory have much in common [DW91], and some problems, such as designing a mobile robot to perform household chores, will require techniques from both fields. When designing agents that can act under uncertainty, it is convenient to model the environment as a POMDP (Partially Observable Markov Decision Process, pronounced “pom-dp”). At (discrete) time step t, the environment is assumed to be in some state Xt. The agent then performs an action (control) At, whereupon2 the environment (stochastically) changes to a new state Xt+ 1. The agent doesn’t see the environment state, but instead receives an observation Yt, which is some (stochastic) function of Xt.(If Yt= Xt, the POMDP reduces to a fully observed MDP.) In addition, the agent receives a special observation signal called the reward, Rt. 3 The POMDP is characterized by the state transition function P (Xt+ 1| Xt, At), the observation function P (Yt| Xt, At− 1), and the reward function

Total citations

Cited by 201

2001200220032004200520062007200820092010201120122013201420152016201720182019202020212022202320241 4 4 8 10 6 11 12 13 5 10 11 14 16 10 5 8 8 8 2 10 12 7 4

Scholar articles

A survey of POMDP solution techniques

KP Murphy - environment, 2000