Automated Program Verification with Large Language Models: A Survey and Review

PhD Qualifying Examination


Title: "Automated Program Verification with Large Language Models: A Survey 
and Review"

by

Mr. Shangyu LI


Abstract:

The rapid escalation of software complexity and the critical reliance on 
computing systems in modern infrastructure have made software reliability a 
paramount concern. Traditionally, formal program verification has served as 
the ultimate gold standard for software assurance, providing mathematically 
rigorous proofs of functional correctness and the absolute absence of 
critical vulnerabilities. However, the immense manual labor, specialized 
expertise, and poor scalability associated with formal methods have severely 
bottlenecked their widespread adoption.

In recent years, the unprecedented evolution of Large Language Models (LLMs) 
has catalyzed a profound paradigm shift across the software engineering 
landscape. Endowed with remarkable capabilities in code comprehension, 
complex logical reasoning, and the translation of informal natural language 
into rigorous mathematical specifications, LLMs offer a highly promising, 
scalable pathway toward fully automated program verification.

This survey provides a comprehensive and structured synthesis of the 
state-of-the-art research at the intersection of Large Language Models and 
formal program verification. We propose a task-centric taxonomy that dissects 
the verification pipeline, examining how LLMs address historical bottlenecks 
such as specification generation and automated proof generation for 
interactive theorem provers. Furthermore, we investigate the rapidly emerging 
trend of neuro-symbolic feedback loops and multi-agent workflows that 
intertwine the System 1 intuition of LLMs with the System 2 rigor of formal 
solvers.

By critically analyzing current benchmarks, evaluation metrics, and the 
persistent challenges of scaling automated verification to complex and 
system-level codebases such as operating system kernels and compilers, we 
outline the trajectory of the field. Finally, we underscore the latent risks, 
notably the generation of vacuous truths, and advocate for evaluation 
paradigms that rigorously assess intent-alignment. This survey aims to guide 
researchers and practitioners in navigating the transition from manual proof 
engineering to LLM-orchestrated, prompt-driven autonomous verification.


Date:                   Wednesday, 17 June 2026

Time:                   1:00pm - 3:00pm

Venue:                  Room 3494
                        Lifts 25/26

Committee Members:      Dr. Jiasi Shen (Supervisor)
                        Prof. Charles Zhang (Chairperson)
                        Dr. Shuai Wang