From Model Robustness to Harness-Level Protection: A Survey on Security of Long-Context LLM Agents

PhD Qualifying Examination


Title: "From Model Robustness to Harness-Level Protection: A Survey on
Security of Long-Context LLM Agents"

by

Mr. Yanbo DAI


Abstract:

Large language model (LLM) agents have evolved from single-turn chat systems 
into long-horizon autonomous systems. They plan over extended contexts, 
retrieve from external memory, invoke tools, execute code, and coordinate with 
other agents within an execution harness. This evolution changes the 
appropriate unit of security analysis. Model robustness remains essential, but 
it is no longer sufficient. Even a well-aligned model may produce an unsafe 
trajectory if the harness exposes over-permissive tools, sends private context 
to unauthorized recipients, or allows untrusted data to affect control flow.

This survey organizes the literature around two complementary layers. The 
first, model robustness, concerns the model's intrinsic resistance to 
jailbreaks, prompt injection, poisoning, backdoors, and information 
extraction. The second, harness-level protection, concerns the enforcement of 
permission, information-flow, and coordination boundaries throughout the 
execution trajectory.

We make four contributions. First, we propose a two-layer taxonomy that 
separates model robustness from harness-level protection while explaining how 
long context couples the two. Second, we construct a unified attack— defense 
map. It connects model-level mechanisms, including alignment, jailbreaks, 
direct prompt injection, poisoning, backdoors, and privacy extraction, with 
their harness-level counterparts, including indirect injection, memory 
poisoning, tool and code abuse, information leakage, and multi-agent 
propagation. Third, we model the harness as a policy-constrained execution 
system governed by permission policies, information-flow policies, and 
coordination policies. We relate these policies to trajectory-level properties 
such as boundary compliance, execution fidelity, and system stability. Fourth, 
we connect this framework to benchmarks, runtime auditing, assurance, 
governance, and open research challenges.

A recurring theme is that long context amplifies security risk. Violations can 
accumulate as trajectories grow, memory can preserve adversarial state across 
steps, and many-shot contexts can enable attacks that short-context models 
resist. Agent safety should therefore be evaluated over the entire execution 
trajectory, rather than only through the final response.


Date:                   Wednesday, 24 June 2026

Time:                   1:00pm - 3:00pm

Zoom Meeting:
https://hkust.zoom.us/j/96184667833?pwd=rg9yi3hEdPkLSpRdbjpbi3c4SrDmM1.1

Committee Members:      Dr. Shuai Wang (Supervisor)
                        Dr. Binhang Yuan (Chairperson)
                        Dr. Dan Xu