More about HKUST
Deep Reinforcement Learning in Urban System Operation
PhD Thesis Proposal Defence Title: "Deep Reinforcement Learning in Urban System Operation" by Yexin LI Abstract: Recently, many urban systems are widely deployed in major cities, e.g. ride-sharing system, bike-sharing system, express system, take-out food packing-delivering system, emergency medical service system, etc. Having modernized and facilitated the daily life of citizens significantly, these systems are facing severe operation challenges. For example, how to continually dispatch the coming orders from customers to drivers in a ride-sharing system; how to redistribute bikes among stations in a bike-sharing system in real time; how to dispatch the couriers in an express system, to deliver more parcels and pick up more packages every day; etc. Previously, operation problems in urban systems are often tackled by methods from operation research, e.g. Optimization, or by some well designed heuristic algorithms based on practical system settings. Deep Reinforcement Learning, i.e. DRL, which adopts the amazing representation ability of deep learning to largely improve traditional reinforcement learning, has obtained state-of-art performances on many operation problems, e.g. the game of Go, Atari games, recommendation, etc. In the operation of urban systems, as we try to generate a sequence of actions in real time and optimize them from a long time perspective, e.g. half day, one day, or even weeks, reinforcement learning is a proper choice to help to operate those systems better. Besides, as urban systems are often large and complex, deep learning methods are necessary to better capture their representative and enriched features. In this article, we investigate how DRL can effectively learn operation policies for some representative urban systems. Based on how a real-world system works, Central-Agent Markov Decision Process, i.e. CAMDP, or Multi-Agent MDP, i.e. MAMDP, is chosen to describe its operation process. For a system whose operation can be formulated as CAMDP, we focus on how to properly formulate the problem and design each component of the MDP, i.e. the state, action, immediate reward, thus to optimize the final target of the system. We adopt the take-out food packing-delivering system as an example under this situation. A Deep Reinforcement Order Packing model, i.e. DROP, is proposed to solve its operation problem. For a system whose operation can be formulated as MAMDP, besides designing each component of the MDP, we also try to guarantee that all the agents in the system can work cooperatively. We adopt the express system as an example under this situation, which has many couriers working in it. In order to ensure the cooperation among couriers, two models are proposed in this article, i.e. a Deep Reinforcement Courier Dispatching model, which incorporates cooperation when designing the state, and a Cooperative Multi-Agent MDP model, i.e. CMMDP, which tries to guarantee the cooperation by incorporating another MDP along the agent sequence. Real-world data from these two representative systems are adopted to design system simulators. Based on simulators, we train and evaluate our models, i.e. DROP, DRCD, and CMMDP. By comparing with baselines, we confirm the superiority of our models. Date: Tuesday, 21 April 2020 Time: 2:00pm - 4:00pm Zoom Meeting: https://hkust.zoom.com.cn/j/613559094 Committee Members: Prof. Qiang Yang (Supervisor) Dr. Kai Chen (Chairperson) Dr. Xiaojuan Ma Dr. Yangqiu Song **** ALL are Welcome ****