Deep Reinforcement Learning for Continuous Control Environments with Multi-modal State-space and Sparse Rewards

MPhil Thesis Defence


Title: "Deep Reinforcement Learning for Continuous Control Environments 
with Multi-modal State-space and Sparse Rewards"

By

Mr. Cancheng ZENG


Abstract

One of the most important problems in artificial intelligence is learning 
to solve control problems without human supervision. Recent advances in 
deep reinforcement learning methods have achieved significant progress in 
this domain. Researchers have solved a number of problems including a 
subset of Atari games, the Go game, and several simple robot control 
environments. However, a general solution to more realistic problems is 
still missing. Most real-world robot control problems have multi-modal 
state space, which usually consists of both a low-dimensional motion 
sensor input and a high-dimensional image sensor input. Apart from that, a 
smooth and informative reward signal is usually unavailable, and the agent 
is only provided a reward signal that is sparse and discrete.

We study a set of continuous control problems with multi-modal state 
space. A subset of the problems also have sparse reward functions. We 
propose several techniques to improve the performance of flat 
reinforcement learning methods on the multi-modal state-space problems. 
The proposed techniques include the Wasserstein actor critic trust-region 
policy optimization method (W-KTR), the exceptional advantage 
regularization method, and the robust concentric Gaussian mixture policy 
model. Experiment results show that the proposed techniques, especially 
the exceptional advantage regularization method, lead to considerable 
performance improvement. A hierarchical reinforcement learning method, 
namely flexible-scheduling hierarchical method, is proposed for the 
sparse-reward multi-modal state-space problems. Experiment results show 
that the flexible-scheduling hierarchical method can solve the problems 
without domain-specific knowledge given a set of pre-defined source tasks.


Date:			Tuesday, 21 August 2018

Time:			2:00pm - 4:00pm

Venue:			Room 3494
 			Lifts 25/26

Committee Members:	Prof. Dit-Yan Yeung (Supervisor)
 			Prof. Nevin Zhang (Chairperson)
 			Prof. Fangzhen Lin


**** ALL are Welcome ****