Building and Leveraging Implicit Models for Policy Gradient Methods

MPhil Thesis Defence


Title: "Building and Leveraging Implicit Models for Policy Gradient Methods"

By

Mr. Zachary William WELLMER


Abstract

In this thesis, we study Policy Prediction Network and Policy Tree Network, 
both are deep reinforcement learning architectures offering ways to improve 
sample complexity and performance on continuous control problems. Furthermore, 
Policy Tree Network offers the ability to trade extra computation at test time 
for improved performance via decision-time planning. Performance gains are 
still observed even in the case of not using decision-time planning(i.e. no 
extra computation cost relative to the model-free baseline). Our approach 
integrates a mix between model-free and model-based reinforcement learning. 
Policy Prediction Network is the first to introduce an implicit model-based 
approach to Policy Gradient algorithms in continuous action space. Policy Tree 
Network is the first to leverage an implicit model for decision-time planning 
in continuous action space. Learning the implicit model is made possible via 
the empirically justified clipping scheme and depth based objectives. 
Leveraging the implicit model for decision-time planning is feasible as a 
result of our tree expansion and backup algorithm. Our experiments are focused 
on the MuJoCo environments so that they can be compared with similar work done 
in this area.


Date:			Tuesday, 30 July 2019

Time:			3:30pm - 5:30pm

Venue:			Room 3494
 			Lifts 25/26

Committee Members:	Prof. Tin-Yau Kwok (Supervisor)
 			Prof. Nevin Zhang (Chairperson)
 			Prof. Dit-Yan Yeung


**** ALL are Welcome ****