Audit-on-Demand Correctness for ANN-Based Vector Search: A Survey and Review of Zero-Knowledge Proof Techniques

PhD Qualifying Examination


Title: "Audit-on-Demand Correctness for ANN-Based Vector Search: A Survey and 
Review of Zero-Knowledge Proof Techniques"

by

Mr. Zipeng QIU


Abstract:

Approximate nearest-neighbor (ANN) vector search has become a core systems 
primitive for semantic search, recommendation, and retrieval-augmented 
generation, yet current service interfaces typically return only a top-k list 
rather than auditable evidence of how that list was produced. This survey 
studies audit-on-demand correctness: after a disputed query result, can a 
service prove that the returned ranking is exactly the output of a publicly 
specified ANN workflow executed on a committed snapshot, while keeping the 
underlying corpus and index state private? We focus on the regime most 
relevant to dense retrieval over proprietary data: ANN execution is 
outsourced, disputes are occasional rather than continuous, and both public 
verifiability and limited disclosure matter.

The survey makes four contributions. First, it formulates audit-on-demand 
correctness as a procedural guarantee: the object of certification is exact 
compliance with a published ANN semantics, not exact nearest-neighbor 
optimality in the raw corpus. Second, it organizes the design space through a 
taxonomy spanning authenticated data structures, replicated re-execution, 
trusted execution environments (TEEs), generic zero-knowledge (ZK) query 
systems, and specialized ZK proofs for vector search. Third, it distills the 
recurring proof-engineering patterns visible in current ZK approaches to ANN 
auditing, especially the proof-oriented IVF-PQ-style pipeline that presently 
offers the clearest end-to-end evidence, including fixed-shape semantics, 
snapshot commitments, order-statistic proofs, lookup-friendly scoring, and 
cost-aware parameter selection. Fourth, it analyzes V3DB as a case study of 
that specialized route and uses it to identify the main open problems for a 
thesis agenda: dynamic freshness, graph-based indices, private queries, hybrid 
trust architectures, and end-to-end accountable retrieval.


Date:                   Monday, 27 April 2026

Time:                   2:00pm - 4:00pm

Venue:                  Room 2126A
                        Lift 19

Committee Members:      Dr. Binhang Yuan (Supervisor)
                        Dr. Shuai Wang (Chairperson)
                        Prof. Ke Yi