More about HKUST
A Survey of Stochastic Multi-Armed Bandits with its Applications
PhD Qualifying Examination
Title: "A Survey of Stochastic Multi-Armed Bandits with its Applications"
by
Mr. Bo LIU
Abstract:
Decision making under the uncertainty faces the common challenges of
exploration-exploitation tradeoff. For instance, deciding which articles to
recommend to users requires exploring newly online articles and exploiting
popular articles. Multi-armed bandit (MAB) formalizes the
exploration-exploitation tradeoff. For a stochastic MAB problem, K arms are
available and each has a distribution of reward. A MAB policy is to
sequentially select arms and observe rewards. The objective, in general, is to
maximize the cumulative reward. The MAB policies are both theoretical
understood and practically effective. This survey covers the definition of
stochastic MAB and the objective function named regret. Then, motivating
applications, particular the online recommendation, are introduced. We survey
policies for context-free bandit problem, contextual bandit problem, and
transfer bandit problem. Three families of algorithms including epsilon-greedy,
Thompson sampling, and upper confidence bound are emphasized. We finally
compare the theoretical regret and empirical performance of different policies.
Date: Tuesday, 28 February 2017
Time: 5:00pm - 7:00pm
Venue: Room 5564
Lifts 27/28
Committee Members: Prof. Qiang Yang (Supervisor)
Prof. James Kwok (Chairperson)
Dr. Yangqiu Song
Dr. Xiaoquan Zhang (ISOM)
**** ALL are Welcome ****