More about HKUST
From Static LLM Serving to Runtime-Adaptive VLA Execution across Heterogeneous Edge Accelerators
PhD Qualifying Examination
Title: "From Static LLM Serving to Runtime-Adaptive VLA Execution across
Heterogeneous Edge Accelerators"
by
Mr. Haodong WANG
Abstract:
Foundation models are moving from language generation to multimodal
perception, reasoning, and embodied action. As inference shifts from cloud
servers to edge devices and robots, limited computation, memory, bandwidth,
and heterogeneous accelerators turn deployment into a resource-aware systems
problem: how to coordinate computation and data movement under dynamic
workloads and hardware states. This survey reviews system-level techniques
for efficient foundation model execution on heterogeneous edge platforms.
Rather than treating existing systems as isolated optimizations, we organize
them as a progression of resource-aware inference paradigms. Static LLM
inference relies on offline execution planning, including offloading,
quantization, and hardware-aware mapping. Runtime-adaptive LLM inference
moves these decisions online, adjusting computation, scheduling, and
generation budget based on current workload and hardware states.
Runtime-adaptive VLA inference further extends adaptation to
perception-reasoning-action loops, where execution decisions affect not only
system efficiency but also action correctness and control stability. We
compare these paradigms and their trade-offs in latency, memory, bandwidth,
accuracy, and action reliability. Finally, we identify the limits of reactive
adaptation and discuss predictive VLA inference, where future runtime system
anticipate execution states, estimate action sensitivity, and proactively
coordinate heterogeneous resources before latency or action risks arise.
Date: Wednesday, 27 May 2026
Time: 3:00pm - 5:00pm
Venue: Room 2128C
Lift 19
Committee Members: Prof. Song Guo (Supervisor)
Dr. Chaojian Li (Chairperson)
Dr. Xiaomin Ouyang