Scalable Oversight for Large Language Models

PhD Qualifying Examination


Title: "Scalable Oversight for Large Language Models"

by

Mr. Zeyu QIN


Abstract:

The rapid progress of Artificial Intelligence (AI) has brought large language
models (LLMs) to the forefront of both research and real-world deployment. In
many domains, these models already approach or even surpass human-level
performance. However, as LLMs grow in capability and are applied to
increasingly complex tasks, it becomes harder for humans to provide reliable,
fine-grained supervision. This growing mismatch introduces significant safety
risks, ranging from misaligned objectives to the amplification of hidden
biases. Ensuring effective oversight of advanced LLMs has therefore become
one of the most critical and pressing challenges in the field of AI
alignment.

Scalable oversight (SO) has emerged as a key paradigm to address this
challenge. By enabling the supervision signal itself to scale with task
complexity, SO reduces the reliance on direct human input while ensuring
oversight remains efficient and robust. Stronger oversight mechanisms reduce
human labor costs, help specify reward objective, and expand the ability of
LLMs to generalize to more complex, open-ended, and long-horizon problems.
Building on the fundamental principles of SOs, this survey provides a
systematic and comprehensive review of existing techniques based on their
underlying principles. Finally, the survey also highlights open challenges and
emerging perspectives, suggesting promising directions for future research on
scalable oversight for LLMs.


Date:                   Friday, 31 October 2025

Time:                   1:00pm - 2:30pm

Venue:                  Room 2612B
                        Lifts 31/32

Committee Members:      Dr. May Fung (Supervisor)
                        Dr. Yangqiu Song (Chairperson)
                        Dr. Junxian He