MPhil Thesis Defence


"Bootstrapping Reinforcement Learning with Supervised Learning for Intelligent Agents in RoboCup"

By

Mr. Andy On-Tik Fung


Abstract

Over the past decade or so, reinforcement learning (RL) as a learning
paradigm has drawn increasing attention from researchers in many subareas
of artificial intelligence (AI), including robotics. RL can be viewed as a
tractable approximate solution to dynamic programming which is the most
common method for solving Markov decision problems. Via strong interaction
with the environment, RL explores through the solution space to learn the
desired policy by maximizing the expected accumulated reward. Quite a
number of toy and real-world problems have demonstrated that RL can find
much better solutions than methods based on supervised learning (SL),
especially when domain knowledge is insufficient to provide good enough
training targets for SL to learn good solutions.

However, a major limitation of RL is the long learning time that is often
needed for an agent to explore both the state and action spaces. This is
inevitable because, unlike SL, it does not assume the existence of any
prior knowledge.  In an attempt to get the best of both worlds, a new
learning approach, called supervised-to-reinforcement learning (S2RL), is
proposed and studied in this thesis.  In essence, it is a hybrid scheme
that integrates the two learning paradigms.  The key idea of S2RL is to
use SL first to acquire a certain level of prior knowledge, which can help
to bootstrap the second phase of learning based on RL.  This hybrid scheme
is superior to SL alone in that better solutions can be found through RL
explorations, and is superior to RL alone in that the learning time can be
significantly shortened with the help of prior knowledge acquired from the
SL phase. Among other issues, determining the optimal point for switching
from SL to RL will be addressed.

The S2RL approach has been studied in a challenging multi-agent testbed
called RoboCup, which is a simulated soccer game.  In particular, an agent
(i.e., soccer player) learns to intercept a moving ball. Experimental
results have confirmed our conjecture that S2RL outperforms SL and RL in
terms of both the accumulated reward and the learning time.


Date:			Wednesday, 22 August 2001

Time:			2:30p.m.-4:30p.m.

Venue:			Room 3006
			Lift 3

Committee Members:		Dr. Dit-Yan Yeung (Supervisor)
			Dr. Nevin Zhang (Chairman)
			Dr. Brian Mak


**** ALL are Welcome ****