Beyond Static Chain-of-Thought: Learning to Control Reasoning in Large Language Models

PhD Qualifying Examination


Title: "Beyond Static Chain-of-Thought: Learning to Control Reasoning in Large
Language Models"

by

Mr. Haoxi LI


Abstract:

Despite the remarkable success of large-scale reasoning models with long
chain-of-thought (CoT) paradigm, most current systems still rely on static
reasoning processes: the patterns of reasoning are pre-defined and remain
unchanged across different inputs. Such static CoT limits the robustness,
efficiency, and reliability of the models as they cannot adjust their
reasoning strategies based on the difficulty, uncertainty, or feedback
characteristics of each instance. Meanwhile, a growing body of research is
beginning to treat reasoning itself as a controllable object, exploring when
and how models should think deeper, verify themselves, or stop early. This
survey provides a systematic overview of this rapidly forming area, which we
consider learning to control reasoning beyond static CoT. We propose a
taxonomy that organizes existing methods into two major sections: (i)
training-time policy learning, which learns reasoning policies such as
adaptive configuration and strategy selection; (ii) inference-time dynamic
computation, including real-time control of step allocation, early exiting,
and thinking speed. We further summarize relevant benchmarks and datasets for
evaluating reasoning performance and outline open challenges in achieving
adaptivity, enhancing efficiency, and integrating these mechanisms into
next-generation metacognitive reasoning models.


Date:                   Wednesday, 26 November 2025

Time:                   2:00pm - 4:00pm

Venue:                  Room 5501
                        Lift 25/26

Committee Members:      Prof. Song Guo (Supervisor)
                        Dr. Shuai Wang (Chairperson)
                        Dr. Wei Wang