Speaker Details

xiaoming

Xiao-Ming Wu

Associate Professor at the Hong Kong Polytechnic University

Dr. Xiao-Ming Wu is an Associate Professor in the Department of Data Science and Artificial Intelligence at The Hong Kong Polytechnic University. She was previously associated with the Department of Computing, The Hong Kong Polytechnic University, since 2016. Dr Wu holds a PhD in Electrical Engineering from Columbia University, an MPhil from The Chinese University of Hong Kong, and both a BSc and MSc from Peking University. She is currently focused on designing AI models that are robust, trustworthy, and explainable. She received the 2024 CCF Science and Technology Achievement Award, Second Prize in Natural Science, for her pioneering work in graph neural networks. Additionally, she received the AI 2000 Most Influential Scholar Honorable Mention in AAAI/IJCAI in 2024 and 2025 and was named among the world's top 2% scientists.

Talk

Title: Reading Out Transformer Activations for Precise Localization in Language Model Steering

Abstract: Post-hoc control of large language models (LLMs) is a critical area for advancing alignment and safety. Activation (representation) steering modifies model behavior by injecting additive "steering vectors" into hidden states at inference time, offering a minimally intrusive, computationally efficient, and often more robust solution for out-of-distribution scenarios without altering model parameters. In this talk, I will review recent advances in inference-time steering techniques, with particular emphasis on the challenging yet underexplored problem of accurately localizing behavior-relevant modules within LLMs to enable effective intervention. I will then present our latest work on a novel method for “reading out” Transformer activations, which enables precise localization of the most relevant modules for a target behavior.