On the Development and Evaluation of Asynchronous Reinforcement Learning for Foundation Models

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


MPhil Thesis Defence


Title: "On the Development and Evaluation of Asynchronous Reinforcement 
Learning for Foundation Models"

By

Mr. Jiashu WANG


Abstract:

Reinforcement Learning (RL) has become the new frontier for aligning Large
Language Models (LLMs). However, traditional synchronous RL systems often
struggle to meet the increasing performance and scalability demands. While
asynchronous RL systems like AReaL have emerged to improve training
efficiency, the lack of mature support for Vision-Language Models (VLMs)
and Parameter-Efficient Fine-Tuning (PEFT) remains a significant barrier
for researchers.

This thesis addresses this gap by introducing AReaL-LoRA and AReaL-VLM,
two architectural extensions to the AReaL framework. Through these
implementations, we provide a comprehensive exploration and benchmarking of
PEFT and VLMs in asynchronous environments. AReaL-LoRA matches the
performance of full-parameter GRPO while significantly reducing GPU hours.
Crucially, we prove that the "staleness" inherent in asynchronous updates
does not harm LoRA.

In the multimodal domain, we use AReaL-VLM to study the reasoning
capabilities of Small VLMs (SVLMs). We identify a critical sensitivity to
the SFT-to-GRPO data ratio, revealing that the optimal data distribution
ratio is highly sensitive to dataset quality and category. We find that an
improper balance of SFT and GRPO—whether insufficient or excessive—is
detrimental to the model's ultimate reasoning efficacy.


Date:                   Friday, 30 January 2026

Time:                   4:00pm - 6:00pm

Venue:                  Room 5501
                        Lifts 25/26

Chairman:               Dr. Chaojian LI

Committee Members:      Dr. Binhang YUAN (Supervisor)
                        Dr. Dan XU