More about HKUST
Recovering High-Level Semantics from Software Binaries: From Symbolic Analysis To LLM-Assisted Reasoning
PhD Qualifying Examination
Title: "Recovering High-Level Semantics from Software Binaries: From
Symbolic Analysis To LLM-Assisted Reasoning"
by
Mr. Qi ZHANG
Abstract:
Software binaries serve as the ultimate ground truth for security tasks like
vulnerability discovery and supply-chain auditing, as they are directly
executed on target machines. However, compilation strips away high-level
semantics, such as variables, types, and control structures, leaving only
low-level instructions. Consequently, binary analysis faces the fundamentally
undecidable challenge of reconstructing these lost source-level abstractions.
This survey systematically reviews techniques for recovering high-level
semantics from stripped binaries at two granularities: (i) local and (ii)
whole-program semantics. For local semantic recovery, we discuss the
reconstruction of data layout, type information, identifier names, and
source-like code structures. For whole-program semantic recovery, we review
approaches for recovering source file and module structures, identifying
third-party components, inferring compiler provenance, and ensuring
decompiled code executability and decompilability. Finally, this survey
highlights the methodological evolution in the field, tracing the cumulative
transition from traditional rule-based symbolic analysis to data-driven
learning, and recently towards LLM-assisted reasoning.
Date: Monday, 15 June 2026
Time: 11:00am - 12:00pm
Venue: Room 5501
Lift 25/26
Committee Members: Prof. Charles Zhang (Supervisor)
Prof. Ke Yi (Chairperson)
Dr. Dimitris Papadopoulos