More about HKUST
Faster Regret Matching: Convergence in Full and Bandit-Feedback Two Player Zero-Sum Games
The Hong Kong University of Science and Technology
Department of Computer Science and Engineering
Final Year Thesis Oral Defense
Title: "Faster Regret Matching: Convergence in Full and Bandit-Feedback
Two Player Zero-Sum Games"
by
MA Qiurui
Abstract:
We consider the setting where two players use the regret matching (RM)
algorithm or its predictive variant to play a fixed two-player zero-sum
game repeatedly for T rounds.
We provide an RM-style algorithm that work with partial feedback. If
both players adopt this algorithm, they can find a strategy profile that
reaches Nash Equilibrium (NE) in O(T-1/3) time. If players further know
the opponents' mixed strategy, the algorithm produces a strategy profile
with convergent rate to NE at O(T-1/2).
In games where each players' feedback is a utility vector that depends on
the opponents' mixed strategy, i.e, the full feedback case, we prove that
Predictive Regret Matching (PRM) and Predictive Regret Matching Plus
(PRM+) satisfies the Regret Bounded by Variation in Utilities (RVU)
property, as the average strategy converges to NE at rate O(T).
Date : 8 May 2021 (Saturday)
Time : 10:00-10:40
Zoom Link:
https://hkust.zoom.us/j/94800625513?pwd=K3FWcHFpdFJtbHhNV21jMlo0SUQ4Zz09
Meeting ID : 948 0062 5513
Passcode : 486205
Advisor : Prof. ZHANG Tong
2nd Reader : Prof. GOLIN Mordecai