More about HKUST
A Survey of Stochastic Multi-Armed Bandits with its Applications
PhD Qualifying Examination Title: "A Survey of Stochastic Multi-Armed Bandits with its Applications" by Mr. Bo LIU Abstract: Decision making under the uncertainty faces the common challenges of exploration-exploitation tradeoff. For instance, deciding which articles to recommend to users requires exploring newly online articles and exploiting popular articles. Multi-armed bandit (MAB) formalizes the exploration-exploitation tradeoff. For a stochastic MAB problem, K arms are available and each has a distribution of reward. A MAB policy is to sequentially select arms and observe rewards. The objective, in general, is to maximize the cumulative reward. The MAB policies are both theoretical understood and practically effective. This survey covers the definition of stochastic MAB and the objective function named regret. Then, motivating applications, particular the online recommendation, are introduced. We survey policies for context-free bandit problem, contextual bandit problem, and transfer bandit problem. Three families of algorithms including epsilon-greedy, Thompson sampling, and upper confidence bound are emphasized. We finally compare the theoretical regret and empirical performance of different policies. Date: Tuesday, 28 February 2017 Time: 5:00pm - 7:00pm Venue: Room 5564 Lifts 27/28 Committee Members: Prof. Qiang Yang (Supervisor) Prof. James Kwok (Chairperson) Dr. Yangqiu Song Dr. Xiaoquan Zhang (ISOM) **** ALL are Welcome ****