Serving Agentic Workloads: From Token-Level Inference to Agent-Level Execution

PhD Qualifying Examination


Title: "Serving Agentic Workloads: From Token-Level Inference to Agent-Level
Execution"

by

Mr. Chaokun CHANG


Abstract:

Large language model applications are shifting from single-turn chatbot 
interactions to agentic workloads: dynamic executions that repeatedly call 
models, invoke tools, retrieve data, run code, manipulate state, and adapt to 
intermediate results. This shift challenges the abstractions used by 
conventional LLM serving systems. Token-level and request-level mechanisms 
such as continuous batching, prefill/decode scheduling, KV-cache management, 
and request migration remain essential, but they are no longer sufficient as 
the primary optimization boundary. A single user task may expand into an 
irregular execution involving LLM calls, tool invocations, memory accesses, 
sandbox operations, external services, and side effects.

This survey argues that agent-level execution should be treated as a 
first-class schedulable and programmable systems entity. We organize recent 
work into a taxonomy spanning agent-aware LLM serving, runtime component 
serving for tools, retrieval, memory, and sandboxes, workflow and program 
serving, and cluster-level multi-agent serving. We further discuss how 
agentic workloads shift evaluation from request latency toward end-to-end 
task completion, cost per successful task, recovery behavior, heterogeneous 
resource contention, and multi-tenant fairness. Finally, we identify open 
research challenges in defining the right schedulable entity, coordinating 
across runtime layers, evaluating agentic serving systems, and designing 
agent-native infrastructure primitives such as checkpoint and restore.


Date:                   Monday, 11 May 2026

Time:                   11:30am - 1:00pm

Venue:                  Room 5501
                        Lift 25/26

Committee Members:      Dr. Wei Wang (Supervisor)
                        Prof. Bo Li (Supervisor)
                        Prof. Song Guo (Chairperson)
                        Dr. Binhang Yuan
Privacy Sitemap
Serving Agentic Workloads: From Token-Level Inference to Agent-Level Execution

About

People

Research

Academics

Admissions