Transferable Bandit

PhD Thesis Proposal Defence


Title: "Transferable Bandit"

by

Mr. Bo LIU


Abstract:

The booming development of Artificial Intelligence promotes a large number of 
online interactive services including recommender system (RecSys), dialogue 
system, etc. These services require the intelligent algorithms to decide 
actions sequentially and to maximize the cumulative user feedbacks. To 
accomplish this goal, the algorithms are expected to simultaneously exploit and 
explore the user interests according to the partial and noisy feedback. Bandit 
is widely used to formulate the exploration-exploitation tradeoff in 
interactive services. When facing the insufficient observations, the bandit 
policies explores more than needed, which can lead to worse short-term rewards.

In this proposal, we study a novel problem: Transferable bandit. Transferable 
bandit adopts transfer learning to leverage prior knowledge from the source 
domains with sufficient observations to further maximize the cumulative rewards 
in the target domain of interest. Transferable bandit harness the collective 
and mutually reinforcing power of bandit formulation and transfer learning. 
First, transfer learning improves the exploitation of a bandit policy and 
accelerates its exploration in the target domain. Second, the bandit policy 
explores and speeds up the knowledge transfer.

We propose to address two critical challenges of the transferable bandit. 
First, we propose the Transfer Contextual Bandit (TCB) policy to bridge the 
action and context heterogeneity. Second, we present Lifelong Contextual Bandit 
(LCB) policy that sequentially transfers knowledge and maximizes the overall 
cumulative rewards. In this proposal, all algorithms are based a general 
framework: 1). How the rewards are generated concerning how the domains are 
related; 2). How to estimate and exploit the reward parameters and knowledge 
transfer; 3). How to measure and then explore the uncertainty of reward 
parameters and knowledge transfer. Both empirical studies on real-world 
datasets and theoretical analysis validate this proposal.


Date:			Monday, 11 December 2017

Time:                  	4:30pm - 6:30pm

Venue:                  Room 5501
                         (lifts 25/26)

Committee Members:	Prof. Qiang Yang (Supervisor)
 			Prof. Lei Chen (Chairperson)
 			Dr. Qiong Luo
 			Prof. Nevin Zhang


**** ALL are Welcome ****