More about HKUST
Building and Leveraging Implicit Models for Policy Gradient Methods
MPhil Thesis Defence Title: "Building and Leveraging Implicit Models for Policy Gradient Methods" By Mr. Zachary William WELLMER Abstract In this thesis, we study Policy Prediction Network and Policy Tree Network, both are deep reinforcement learning architectures offering ways to improve sample complexity and performance on continuous control problems. Furthermore, Policy Tree Network offers the ability to trade extra computation at test time for improved performance via decision-time planning. Performance gains are still observed even in the case of not using decision-time planning(i.e. no extra computation cost relative to the model-free baseline). Our approach integrates a mix between model-free and model-based reinforcement learning. Policy Prediction Network is the first to introduce an implicit model-based approach to Policy Gradient algorithms in continuous action space. Policy Tree Network is the first to leverage an implicit model for decision-time planning in continuous action space. Learning the implicit model is made possible via the empirically justified clipping scheme and depth based objectives. Leveraging the implicit model for decision-time planning is feasible as a result of our tree expansion and backup algorithm. Our experiments are focused on the MuJoCo environments so that they can be compared with similar work done in this area. Date: Tuesday, 30 July 2019 Time: 3:30pm - 5:30pm Venue: Room 3494 Lifts 25/26 Committee Members: Prof. Tin-Yau Kwok (Supervisor) Prof. Nevin Zhang (Chairperson) Prof. Dit-Yan Yeung **** ALL are Welcome ****