Stateful Large Language Model Serving

PhD Qualifying Examination


Title: "Stateful Large Language Model Serving"

by

Mr. Qianli LIU


Abstract:

LLM serving is becoming stateful in practice. Modern serving stacks 
increasingly keep, move, restore, route toward, or share runtime objects such 
as KV pages, prefix modules, restorable activations, and adapters. As these 
objects persist across turns, phases, workers, tiers, tenants, and 
application boundaries, later inference depends on decisions made about 
earlier execution state. This survey reviews recent LLM serving systems 
through the lens of managed inference state: a serving-path representation 
whose identity, placement, reuse, invalidation, or isolation can change later 
inference or routing behavior. We organize the literature around three 
runtime state objects— context-computation state, execution-restoration 
state, and adaptation state—and a separate layer of application-level 
information that helps the runtime decide which state to keep, route toward, 
reuse, or invalidate. The lifecycle view connects preservation, placement, 
restoration, reuse, routing, programming interfaces, governance, and 
evaluation without reducing them to a generic cache taxonomy. Its main 
finding is simple: a cache hit or warm state is useful only if it can be 
reached in time, still matches the current execution context, and is allowed 
by the sharing policy. We conclude with research problems in state metadata, 
admission and valuation benchmarks, compatibility contracts, 
trust-domain-aware sharing, bounded programmable control, state-lifecycle 
evaluation, and the extension of stateful-serving ideas beyond LLMs to 
broader foundation-model serving.


Date:                   Wednesday, 27 May 2026

Time:                   1:00pm - 3:00pm

Venue:                  Room 2128A
                        Lift 19

Committee Members:      Prof. Song Guo (Supervisor)
                        Dr. Xiaomin Ouyang (Chairperson)
                        Dr. Chaojian Li