Transferable Bandit

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Transferable Bandit"

By

Mr. Bo LIU


Abstract

The booming development of Artificial Intelligence promotes a large number of 
online interactive services including recommender system (RecSys), online 
advertising, dialogue system, etc. These services require sophisticated 
algorithms to decide actions sequentially and to maximize the cumulative user 
feedback in the long run. To accomplish this goal, algorithms should 
simultaneously exploit and explore the user interests according to the partial 
and noisy user feedback. Bandit problem can successfully formulate the 
exploitation-exploration trade-off in these applications. When facing the 
insufficient observations in a target domain of interest, unfortunately, bandit 
policies may explore more than needed, which may lead to worse performance.

In this thesis, we study a novel and challenging problem: Transferable Bandit. 
Via transfer learning, transferable bandit leverages prior knowledge from the 
existing source domains with sufficient user feedback to further optimize the 
cumulative rewards in the target domains of interest. Transferable bandit 
harness the collective and mutually reinforcing power of the bandit formulation 
and transfer learning. First, transfer learning improves the exploitation, 
accelerates its exploration, and balances the exploitation and exploration 
appropriately in the target domain. Second, the transferable bandit policy 
explores how to transfer and speeds up the knowledge transfer.

This thesis addresses three critical challenges of the transferable bandit 
problem. First, we propose the Transfer Contextual Bandit (TCB) policy to 
bridge the action and context heterogeneity. Second, we present the Lifelong 
Contextual Bandit (LCB) policy that sequentially transfers knowledge across 
homogeneous domains and maximizes overall cumulative rewards. Third, to 
facilitate the large-scale online deployment, we illustrate two speed-up 
methods including stochastic approximation and feature selection. This thesis 
also presents a general framework based on the upper confidence bound principle 
to address the transferable bandit problem. Both empirical studies on 
real-world datasets and theoretical regret analysis validate this thesis.


Date:			Wednesday, 14 November 2018

Time:			3:00pm - 5:00pm

Venue:			Room 2408
 			Lifts 17/18

Chairman:		Prof. Qing Li (ISOM)

Committee Members:	Prof. Qiang Yang (Supervisor)
 			Prof. Yangqiu Song
 			Prof. Ke Yi
 			Prof. Wenbo Wang (MARK)
 			Prof. Irwin King (CUHK)


**** ALL are Welcome ****