More about HKUST
A Survey of Activation Steering Methods in Large Language Models
PhD Qualifying Examination
Title: "A Survey of Activation Steering Methods in Large Language Models"
by
Miss Zheng CHEN
Abstract:
Large language models (LLMs) exhibit impressive capabilities but often
produce outputs that are untruthful, toxic, or misaligned with user intent. A
rapidly growing paradigm called activation steering addresses this challenge
by directly manipulating the model's internal representations and hence
modifying the outputs without retraining. Since any activation-level
intervention must first decide where to intervene and then how to intervene,
we organize the literature along these two axes into three families: (1)
locating methods that identify behaviorally relevant directions or
components; (2) steering methods that modify activations to control
generation; and (3) integrated methods that jointly locate and steer. For
each family we formalize the core mathematical operation, compare
representative works, and discuss trade-offs. We then provide an empirical
comparison across six application domains, which include truthfulness
enhancement, behavioral and persona steering, reasoning steering, toxicity
reduction and controlled generation, safety and refusal evaluation, and
mechanistic interpretability benchmarks. We conclude by outlining the open
problems and future directions.
Date: Tuesday, 12 May 2026
Time: 1:00pm - 2:00pm
Venue: Room 5501
Lift 25/26
Committee Members: Prof. Bo Li (Supervisor)
Dr. May Fung (Chairperson)
Dr. Dan Xu