More about HKUST
Automated Program Verification with Large Language Models: A Survey and Review
PhD Qualifying Examination
Title: "Automated Program Verification with Large Language Models: A Survey
and Review"
by
Mr. Shangyu LI
Abstract:
The rapid escalation of software complexity and the critical reliance on
computing systems in modern infrastructure have made software reliability a
paramount concern. Traditionally, formal program verification has served as
the ultimate gold standard for software assurance, providing mathematically
rigorous proofs of functional correctness and the absolute absence of
critical vulnerabilities. However, the immense manual labor, specialized
expertise, and poor scalability associated with formal methods have severely
bottlenecked their widespread adoption.
In recent years, the unprecedented evolution of Large Language Models (LLMs)
has catalyzed a profound paradigm shift across the software engineering
landscape. Endowed with remarkable capabilities in code comprehension,
complex logical reasoning, and the translation of informal natural language
into rigorous mathematical specifications, LLMs offer a highly promising,
scalable pathway toward fully automated program verification.
This survey provides a comprehensive and structured synthesis of the
state-of-the-art research at the intersection of Large Language Models and
formal program verification. We propose a task-centric taxonomy that dissects
the verification pipeline, examining how LLMs address historical bottlenecks
such as specification generation and automated proof generation for
interactive theorem provers. Furthermore, we investigate the rapidly emerging
trend of neuro-symbolic feedback loops and multi-agent workflows that
intertwine the System 1 intuition of LLMs with the System 2 rigor of formal
solvers.
By critically analyzing current benchmarks, evaluation metrics, and the
persistent challenges of scaling automated verification to complex and
system-level codebases such as operating system kernels and compilers, we
outline the trajectory of the field. Finally, we underscore the latent risks,
notably the generation of vacuous truths, and advocate for evaluation
paradigms that rigorously assess intent-alignment. This survey aims to guide
researchers and practitioners in navigating the transition from manual proof
engineering to LLM-orchestrated, prompt-driven autonomous verification.
Date: Wednesday, 17 June 2026
Time: 1:00pm - 3:00pm
Venue: Room 3494
Lifts 25/26
Committee Members: Dr. Jiasi Shen (Supervisor)
Prof. Charles Zhang (Chairperson)
Dr. Shuai Wang