Advancing Training and Inference Efficiency in Large-Scale Models

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Advancing Training and Inference Efficiency in Large-Scale Models"

By

Mr. Shih-yang LIU


Abstract:

While large-scale deep learning models, particularly Large Language Models 
(LLMs), have achieved unprecedented performance across a wide range of tasks, 
their rapidly increasing model size, computational cost, and inference 
latency present significant challenges for both deployment and training. 
These challenges are especially pronounced in resource-constrained 
environments and real-time applications, where memory bandwidth/size, 
computational throughput, and reasoning efficiency become critical 
bottlenecks. As a result, improving the efficiency of training and inference 
has become a central focus in the research community and an essential 
component for enabling practical and scalable AI systems.

This thesis presents a comprehensive investigation of efficiency in large 
scale models, categorizing these enhancements into training and inference 
phases. Spanning the spectrum from model compression to reasoning 
acceleration, we first outline methods designed to improve inference 
efficiency, such as quantization, pruning, and token- length reduction. We 
then survey prior works that address training efficiency, specifically 
focusing on parameter-efficient fine tuning. Following this foundation, we 
introduce several of our innovative methodologies that elevate the efficiency 
of the entire pipeline, from training to final inference, across diverse 
model architectures and inference paradigms.

First, we address the inference efficiency of Vision Transformers and propose 
Oscillation-Free Quantization. This quantization-aware training technique 
eliminates the instability caused by oscillatory weight updates, enabling 
stable and accurate quantization even at extremely low bitwidths. Next, we 
introduce LLM-FP4, a post training quantization method designed to enhance 
the inference efficiency of large language models. This approach achieves 
superior trade offs between accuracy and efficiency through the careful 
co-design of numerical formats and quantization strategies. Third, we propose 
Eigenspace Low Rank Approximation, a weight space inference acceleration 
technique that combines quantization, pruning, and low-rank decomposition. 
This method enables accuracy recovery for compressed models without the need 
for additional fine-tuning. By operating within the eigenspace, the framework 
effectively restores performance lost during aggressive compression while 
avoiding the overhead of expensive retraining. Fourth, we shift our focus to 
activation space efficiency by leveraging reinforcement learning to enhance 
model reasoning efficiency. We propose Group Reward Decoupled Normalization 
Policy Optimization (GDPO), a multi reward reinforcement learning algorithm 
that helps teach the model to simultaneously minimize reasoning tokens and 
maximize accuracy. By incentivizing higher intelligence per token and 
curtailing redundant reasoning steps, this approach significantly reduces 
inference latency and computational cost during autoregressive generation. 
Finally, we address training efficiency by focusing on low-rank based 
parameter-efficient fine tuning. We propose Weight Decomposed Low Rank 
Adaptation (DoRA), a novel technique that improves training efficiency by 
partitioning weight updates into low rank components. This framework enhances 
training stability, accelerates convergence, and improves parameter 
efficiency, enabling highly cost effective adaptation of large language 
models for various downstream tasks.


Date:                   Tuesday, 16 June 2026

Time:                   3:30pm - 5:30pm

Venue:                  Room 3494
                        Lifts 25/26

Chairman:               Dr. Shiheng WANG (ACCT)

Committee Members:      Prof. Tim CHENG (Supervisor)
                        Dr. Yangqiu SONG
                        Dr. Dan XU
                        Prof. Jun ZHANG (ECE)
                        Dr. Jing LI (PolyU)
Privacy Sitemap
Advancing Training and Inference Efficiency in Large-Scale Models

About

People

Research

Academics

Admissions