More about HKUST
Stateful Large Language Model Serving
PhD Qualifying Examination
Title: "Stateful Large Language Model Serving"
by
Mr. Qianli LIU
Abstract:
LLM serving is becoming stateful in practice. Modern serving stacks
increasingly keep, move, restore, route toward, or share runtime objects such
as KV pages, prefix modules, restorable activations, and adapters. As these
objects persist across turns, phases, workers, tiers, tenants, and
application boundaries, later inference depends on decisions made about
earlier execution state. This survey reviews recent LLM serving systems
through the lens of managed inference state: a serving-path representation
whose identity, placement, reuse, invalidation, or isolation can change later
inference or routing behavior. We organize the literature around three
runtime state objects— context-computation state, execution-restoration
state, and adaptation state—and a separate layer of application-level
information that helps the runtime decide which state to keep, route toward,
reuse, or invalidate. The lifecycle view connects preservation, placement,
restoration, reuse, routing, programming interfaces, governance, and
evaluation without reducing them to a generic cache taxonomy. Its main
finding is simple: a cache hit or warm state is useful only if it can be
reached in time, still matches the current execution context, and is allowed
by the sharing policy. We conclude with research problems in state metadata,
admission and valuation benchmarks, compatibility contracts,
trust-domain-aware sharing, bounded programmable control, state-lifecycle
evaluation, and the extension of stateful-serving ideas beyond LLMs to
broader foundation-model serving.
Date: Wednesday, 27 May 2026
Time: 1:00pm - 3:00pm
Venue: Room 2128A
Lift 19
Committee Members: Prof. Song Guo (Supervisor)
Dr. Xiaomin Ouyang (Chairperson)
Dr. Chaojian Li