Faster Regret Matching: Convergence in Full and Bandit-Feedback Two Player Zero-Sum Games

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering

Final Year Thesis Oral Defense

Title: "Faster Regret Matching: Convergence in Full and Bandit-Feedback 
        Two Player Zero-Sum Games"

by

MA Qiurui

Abstract:

We consider the setting where two players use the regret matching (RM) 
algorithm or its predictive variant to play a fixed two-player zero-sum 
game repeatedly for T rounds.

We provide an RM-style algorithm that work with partial feedback. If 
both players adopt this algorithm, they can find a strategy profile that 
reaches Nash Equilibrium (NE) in O(T-1/3) time. If players further know 
the opponents' mixed strategy, the algorithm produces a strategy profile 
with convergent rate to NE at O(T-1/2).

In games where each players' feedback is a utility vector that depends on 
the opponents' mixed strategy, i.e, the full feedback case, we prove that 
Predictive Regret Matching (PRM) and Predictive Regret Matching Plus 
(PRM+) satisfies the Regret Bounded by Variation in Utilities (RVU) 
property, as the average strategy converges to NE at rate O(T).


Date            : 8 May 2021 (Saturday)

Time            : 10:00-10:40

Zoom Link:
https://hkust.zoom.us/j/94800625513?pwd=K3FWcHFpdFJtbHhNV21jMlo0SUQ4Zz09

Meeting ID      : 948 0062 5513
Passcode        : 486205

Advisor         : Prof. ZHANG Tong

2nd Reader      : Prof. GOLIN Mordecai