More about HKUST
Faster Regret Matching: Convergence in Full and Bandit-Feedback Two Player Zero-Sum Games
The Hong Kong University of Science and Technology Department of Computer Science and Engineering Final Year Thesis Oral Defense Title: "Faster Regret Matching: Convergence in Full and Bandit-Feedback Two Player Zero-Sum Games" by MA Qiurui Abstract: We consider the setting where two players use the regret matching (RM) algorithm or its predictive variant to play a fixed two-player zero-sum game repeatedly for T rounds. We provide an RM-style algorithm that work with partial feedback. If both players adopt this algorithm, they can find a strategy profile that reaches Nash Equilibrium (NE) in O(T-1/3) time. If players further know the opponents' mixed strategy, the algorithm produces a strategy profile with convergent rate to NE at O(T-1/2). In games where each players' feedback is a utility vector that depends on the opponents' mixed strategy, i.e, the full feedback case, we prove that Predictive Regret Matching (PRM) and Predictive Regret Matching Plus (PRM+) satisfies the Regret Bounded by Variation in Utilities (RVU) property, as the average strategy converges to NE at rate O(T). Date : 8 May 2021 (Saturday) Time : 10:00-10:40 Zoom Link: https://hkust.zoom.us/j/94800625513?pwd=K3FWcHFpdFJtbHhNV21jMlo0SUQ4Zz09 Meeting ID : 948 0062 5513 Passcode : 486205 Advisor : Prof. ZHANG Tong 2nd Reader : Prof. GOLIN Mordecai