A Survey of Stochastic Multi-Armed Bandits with its Applications

PhD Qualifying Examination


Title: "A Survey of Stochastic Multi-Armed Bandits with its Applications"

by

Mr. Bo LIU


Abstract:

Decision making under the uncertainty faces the common challenges of 
exploration-exploitation tradeoff. For instance, deciding which articles to 
recommend to users requires exploring newly online articles and exploiting 
popular articles. Multi-armed bandit (MAB) formalizes the 
exploration-exploitation tradeoff. For a stochastic MAB problem, K arms are 
available and each has a distribution of reward. A MAB policy is to 
sequentially select arms and observe rewards. The objective, in general, is to 
maximize the cumulative reward. The MAB policies are both theoretical 
understood and practically effective. This survey covers the definition of 
stochastic MAB and the objective function named regret. Then, motivating 
applications, particular the online recommendation,  are introduced. We survey 
policies for context-free bandit problem, contextual bandit problem, and 
transfer bandit problem. Three families of algorithms including epsilon-greedy, 
Thompson sampling, and upper confidence bound are emphasized. We finally 
compare the theoretical regret and empirical performance of different policies.


Date:			Tuesday, 28 February 2017

Time:                  	5:00pm - 7:00pm

Venue:                  Room 5564
                         Lifts 27/28

Committee Members:	Prof. Qiang Yang (Supervisor)
 			Prof. James Kwok (Chairperson)
 			Dr. Yangqiu Song
 			Dr. Xiaoquan Zhang (ISOM)


**** ALL are Welcome ****