A Survey on Systems for RL Post-Training of LLMs

PhD Qualifying Examination


Title: "A Survey on Systems for RL Post-Training of LLMs"

by

Mr. Lunxi CAO


Abstract:

As the development of Large Language Models (LLMs) moves beyond the era of
pre-training with scaling laws, the current research focus has shifted towards
post-training based on Reinforcement Learning (RL). Unlike static pre-training,
RL post-training introduces three unique challenges to system design. The
complex orchestration makes existing training systems inadequate to handle RL
jobs. The rollout bottleneck constrains the overall efficiency of the entire
workload. The synchronization barrier imposes strict limitations on
parallelization.

This survey provides a comprehensive overview of systems for RL post-training.
We begin by tracing the evolution of RL frameworks for orchestration from
monolithic designs to modular architectures. We then present system
optimizations to address the critical rollout bottleneck. Furthermore, we
investigate the paradigm shift towards asynchronous RL, which offers greater
flexibility in system design. Finally, we conclude by briefly summarizing
these advancements of RL post-training systems and discussing future
frontiers in this field. We hope that this survey can offer a comprehensive
perspective on the design space of RL post-training systems and facilitate the
development of more scalable and efficient RL post-training infrastructures.


Date:                   Tuesday, 15 April 2026

Time:                   2:00pm - 3:00pm

Venue:                  Room 2126A
                        Lift 19

Committee Members:      Dr. Wei Wang (Supervisor)
                        Prof. Bo Li (Chairperson)
                        Prof. Song Guo