More about HKUST
Transferable Bandit
PhD Thesis Proposal Defence
Title: "Transferable Bandit"
by
Mr. Bo LIU
Abstract:
The booming development of Artificial Intelligence promotes a large number of
online interactive services including recommender system (RecSys), dialogue
system, etc. These services require the intelligent algorithms to decide
actions sequentially and to maximize the cumulative user feedbacks. To
accomplish this goal, the algorithms are expected to simultaneously exploit and
explore the user interests according to the partial and noisy feedback. Bandit
is widely used to formulate the exploration-exploitation tradeoff in
interactive services. When facing the insufficient observations, the bandit
policies explores more than needed, which can lead to worse short-term rewards.
In this proposal, we study a novel problem: Transferable bandit. Transferable
bandit adopts transfer learning to leverage prior knowledge from the source
domains with sufficient observations to further maximize the cumulative rewards
in the target domain of interest. Transferable bandit harness the collective
and mutually reinforcing power of bandit formulation and transfer learning.
First, transfer learning improves the exploitation of a bandit policy and
accelerates its exploration in the target domain. Second, the bandit policy
explores and speeds up the knowledge transfer.
We propose to address two critical challenges of the transferable bandit.
First, we propose the Transfer Contextual Bandit (TCB) policy to bridge the
action and context heterogeneity. Second, we present Lifelong Contextual Bandit
(LCB) policy that sequentially transfers knowledge and maximizes the overall
cumulative rewards. In this proposal, all algorithms are based a general
framework: 1). How the rewards are generated concerning how the domains are
related; 2). How to estimate and exploit the reward parameters and knowledge
transfer; 3). How to measure and then explore the uncertainty of reward
parameters and knowledge transfer. Both empirical studies on real-world
datasets and theoretical analysis validate this proposal.
Date: Monday, 11 December 2017
Time: 4:30pm - 6:30pm
Venue: Room 5501
(lifts 25/26)
Committee Members: Prof. Qiang Yang (Supervisor)
Prof. Lei Chen (Chairperson)
Dr. Qiong Luo
Prof. Nevin Zhang
**** ALL are Welcome ****