From Static LLM Serving to Runtime-Adaptive VLA Execution across Heterogeneous Edge Accelerators

PhD Qualifying Examination


Title: "From Static LLM Serving to Runtime-Adaptive VLA Execution across 
Heterogeneous Edge Accelerators"

by

Mr. Haodong WANG


Abstract:

Foundation models are moving from language generation to multimodal 
perception, reasoning, and embodied action. As inference shifts from cloud 
servers to edge devices and robots, limited computation, memory, bandwidth, 
and heterogeneous accelerators turn deployment into a resource-aware systems 
problem: how to coordinate computation and data movement under dynamic 
workloads and hardware states. This survey reviews system-level techniques 
for efficient foundation model execution on heterogeneous edge platforms. 
Rather than treating existing systems as isolated optimizations, we organize 
them as a progression of resource-aware inference paradigms. Static LLM 
inference relies on offline execution planning, including offloading, 
quantization, and hardware-aware mapping. Runtime-adaptive LLM inference 
moves these decisions online, adjusting computation, scheduling, and 
generation budget based on current workload and hardware states. 
Runtime-adaptive VLA inference further extends adaptation to 
perception-reasoning-action loops, where execution decisions affect not only 
system efficiency but also action correctness and control stability. We 
compare these paradigms and their trade-offs in latency, memory, bandwidth, 
accuracy, and action reliability. Finally, we identify the limits of reactive 
adaptation and discuss predictive VLA inference, where future runtime system 
anticipate execution states, estimate action sensitivity, and proactively 
coordinate heterogeneous resources before latency or action risks arise.


Date:                   Wednesday, 27 May 2026

Time:                   3:00pm - 5:00pm

Venue:                  Room 2128C
                        Lift 19

Committee Members:      Prof. Song Guo (Supervisor)
                        Dr. Chaojian Li (Chairperson)
                        Dr. Xiaomin Ouyang

Privacy Sitemap

From Static LLM Serving to Runtime-Adaptive VLA Execution across Heterogeneous Edge Accelerators

About

People

Research

Academics

Admissions