Deep Reinforcement Learning in Urban System Operation

PhD Thesis Proposal Defence


Title: "Deep Reinforcement Learning in Urban System Operation"

by

Yexin LI


Abstract:

Recently, many urban systems are widely deployed in major cities, e.g.
ride-sharing system, bike-sharing system, express system, take-out food
packing-delivering system, emergency medical service system, etc. Having
modernized and facilitated the daily life of citizens significantly, these
systems are facing severe operation challenges. For example, how to
continually dispatch the coming orders from customers to drivers in a
ride-sharing system; how to redistribute bikes among stations in a
bike-sharing system in real time; how to dispatch the couriers in an
express system, to deliver more parcels and pick up more packages every
day; etc. Previously, operation problems in urban systems are often
tackled by methods from operation research, e.g. Optimization, or by some
well designed heuristic algorithms based on practical system settings.

Deep Reinforcement Learning, i.e. DRL, which adopts the amazing 
representation ability of deep learning to largely improve traditional 
reinforcement learning, has obtained state-of-art performances on many 
operation problems, e.g. the game of Go, Atari games, recommendation, etc. 
In the operation of urban systems, as we try to generate a sequence of 
actions in real time and optimize them from a long time perspective, e.g. 
half day, one day, or even weeks, reinforcement learning is a proper 
choice to help to operate those systems better. Besides, as urban systems 
are often large and complex, deep learning methods are necessary to better 
capture their representative and enriched features. In this article, we 
investigate how DRL can effectively learn operation policies for some 
representative urban systems. Based on how a real-world system works, 
Central-Agent Markov Decision Process, i.e. CAMDP, or Multi-Agent MDP, 
i.e. MAMDP, is chosen to describe its operation process. For a system 
whose operation can be formulated as CAMDP, we focus on how to properly 
formulate the problem and design each component of the MDP, i.e. the 
state, action, immediate reward, thus to optimize the final target of the 
system. We adopt the take-out food packing-delivering system as an example 
under this situation. A Deep Reinforcement Order Packing model, i.e. DROP, 
is proposed to solve its operation problem. For a system whose operation 
can be formulated as MAMDP, besides designing each component of the MDP, 
we also try to guarantee that all the agents in the system can work 
cooperatively. We adopt the express system as an example under this 
situation, which has many couriers working in it. In order to ensure the 
cooperation among couriers, two models are proposed in this article, i.e. 
a Deep Reinforcement Courier Dispatching model, which incorporates 
cooperation when designing the state, and a Cooperative Multi-Agent MDP 
model, i.e. CMMDP, which tries to guarantee the cooperation by 
incorporating another MDP along the agent sequence.

Real-world data from these two representative systems are adopted to
design system simulators. Based on simulators, we train and evaluate our
models, i.e. DROP, DRCD, and CMMDP. By comparing with baselines, we
confirm the superiority of our models.


Date:                   Tuesday, 21 April 2020

Time:                   2:00pm - 4:00pm

Zoom Meeting:           https://hkust.zoom.com.cn/j/613559094

Committee Members:      Prof. Qiang Yang (Supervisor)
                        Dr. Kai Chen (Chairperson)
                        Dr. Xiaojuan Ma
                        Dr. Yangqiu Song


**** ALL are Welcome ****