MPhil Thesis Defence "Bootstrapping Reinforcement Learning with Supervised Learning for Intelligent Agents in RoboCup" By Mr. Andy On-Tik Fung Abstract Over the past decade or so, reinforcement learning (RL) as a learning paradigm has drawn increasing attention from researchers in many subareas of artificial intelligence (AI), including robotics. RL can be viewed as a tractable approximate solution to dynamic programming which is the most common method for solving Markov decision problems. Via strong interaction with the environment, RL explores through the solution space to learn the desired policy by maximizing the expected accumulated reward. Quite a number of toy and real-world problems have demonstrated that RL can find much better solutions than methods based on supervised learning (SL), especially when domain knowledge is insufficient to provide good enough training targets for SL to learn good solutions. However, a major limitation of RL is the long learning time that is often needed for an agent to explore both the state and action spaces. This is inevitable because, unlike SL, it does not assume the existence of any prior knowledge. In an attempt to get the best of both worlds, a new learning approach, called supervised-to-reinforcement learning (S2RL), is proposed and studied in this thesis. In essence, it is a hybrid scheme that integrates the two learning paradigms. The key idea of S2RL is to use SL first to acquire a certain level of prior knowledge, which can help to bootstrap the second phase of learning based on RL. This hybrid scheme is superior to SL alone in that better solutions can be found through RL explorations, and is superior to RL alone in that the learning time can be significantly shortened with the help of prior knowledge acquired from the SL phase. Among other issues, determining the optimal point for switching from SL to RL will be addressed. The S2RL approach has been studied in a challenging multi-agent testbed called RoboCup, which is a simulated soccer game. In particular, an agent (i.e., soccer player) learns to intercept a moving ball. Experimental results have confirmed our conjecture that S2RL outperforms SL and RL in terms of both the accumulated reward and the learning time. Date: Wednesday, 22 August 2001 Time: 2:30p.m.-4:30p.m. Venue: Room 3006 Lift 3 Committee Members: Dr. Dit-Yan Yeung (Supervisor) Dr. Nevin Zhang (Chairman) Dr. Brian Mak **** ALL are Welcome ****