More about HKUST
Transferable Bandit
The Hong Kong University of Science and Technology Department of Computer Science and Engineering PhD Thesis Defence Title: "Transferable Bandit" By Mr. Bo LIU Abstract The booming development of Artificial Intelligence promotes a large number of online interactive services including recommender system (RecSys), online advertising, dialogue system, etc. These services require sophisticated algorithms to decide actions sequentially and to maximize the cumulative user feedback in the long run. To accomplish this goal, algorithms should simultaneously exploit and explore the user interests according to the partial and noisy user feedback. Bandit problem can successfully formulate the exploitation-exploration trade-off in these applications. When facing the insufficient observations in a target domain of interest, unfortunately, bandit policies may explore more than needed, which may lead to worse performance. In this thesis, we study a novel and challenging problem: Transferable Bandit. Via transfer learning, transferable bandit leverages prior knowledge from the existing source domains with sufficient user feedback to further optimize the cumulative rewards in the target domains of interest. Transferable bandit harness the collective and mutually reinforcing power of the bandit formulation and transfer learning. First, transfer learning improves the exploitation, accelerates its exploration, and balances the exploitation and exploration appropriately in the target domain. Second, the transferable bandit policy explores how to transfer and speeds up the knowledge transfer. This thesis addresses three critical challenges of the transferable bandit problem. First, we propose the Transfer Contextual Bandit (TCB) policy to bridge the action and context heterogeneity. Second, we present the Lifelong Contextual Bandit (LCB) policy that sequentially transfers knowledge across homogeneous domains and maximizes overall cumulative rewards. Third, to facilitate the large-scale online deployment, we illustrate two speed-up methods including stochastic approximation and feature selection. This thesis also presents a general framework based on the upper confidence bound principle to address the transferable bandit problem. Both empirical studies on real-world datasets and theoretical regret analysis validate this thesis. Date: Wednesday, 14 November 2018 Time: 3:00pm - 5:00pm Venue: Room 2408 Lifts 17/18 Chairman: Prof. Qing Li (ISOM) Committee Members: Prof. Qiang Yang (Supervisor) Prof. Yangqiu Song Prof. Ke Yi Prof. Wenbo Wang (MARK) Prof. Irwin King (CUHK) **** ALL are Welcome ****