PhD Thesis Defence

"Algorithms for Partially Observable Markov Decision Processes"

By

Mr. Weihong Zhang


Abstract

Partially Observable Markov Decision Process (POMDP) is a general
sequential decision-making model where the effects of actions are
nondeterministic and only partial information is available. However,
finding near optimal solutions for POMDPs is computationally difficult.
Value iteration is a standard algorithm for solving POMDPs. It conducts a
sequence of dynamic programming (DP) updates to improve value functions.

Value iteration is inefficient for two reasons. First, a DP update is
expensive due to the need of accounting for all belief states in a
continuous belief space. Second, value iteration needs to conduct a large
number of DP updates before its convergence. This thesis investigates two
ways to accelerate value iteration. The work presented centers around the
idea of conducting DP updates and therefore value iteration over a belief
subspace, a subset of belief space.

The first use of belief subspace is to reduce the number of DP updates for
value iteration to converge. We design a computationally cheap procedure
considering a belief subspace which consists of a finite number of belief
states. It is used as an additional step for improving value functions.
Due to additional improvements by the procedure, value iteration conducts
fewer DP updates and therefore is more efficient.

The second use of belief subspace is to reduce the complexity of DP
updates. We establish a framework on how to carry out value iteration over
a belief subspace determined by a POMDP model. Whether the belief subspace
is smaller than the belief space is model-dependent. If this is true for a
POMDP, value iteration over the belief subspace is expected to be more
efficient.

Based on this framework, we study three POMDP classes with special problem
characteristics and propose different value iteration algorithms for them.
(1) An informative POMDP assumes that an agent always has a good idea
about the world states. The subspace determined by the model is much
smaller than the belief space. Value iteration over belief subspace is
more efficient for this POMDP class. (2) A near-discernible POMDP assumes
that the agent can get a good idea about states once in a while if it
executes some particular actions. For such a POMDP, the belief subspace
determined by the model can be of the same size as the belief space. We
propose an anytime algorithm which focuses the computations on a small
belief subspace and gradually expand it. (3) A more general class than
near-discernible POMDPs assumes that the agent can get a good idea about
states with a high likelihood once in a while if it executes some
particular actions. For such POMDPs, we adapt the anytime algorithm to
conduct value iteration over a growing belief subspace.


Date:			Tuesday, 21 August 2001

Time:			2:00p.m.-4:00p.m.

Venue:			Room 1505
			Lifts 25-26

Chairman:		Dr. Chu Zhang (FINA)

Committee Members:		Dr. Nevin Zhang (Supervisor)
			Dr. Mordecai Golin
			Dr. Dit-Yan Yeung
			Dr. Shaohui Zheng (ISMT)
			Prof. Judy Goldsmith (Comp., Sci., Univ. of Kentucky)


**** ALL are Welcome ****