Towards Bridging Human Requirements and Program Synthesis

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "Towards Bridging Human Requirements and Program Synthesis"

By

Mr. Jiarong WU


Abstract:

Program synthesis aims to synthesize programs meeting the user-given 
specifications, such as input-output examples and natural language 
descriptions. Recent advancements in both symbolic and neural synthesis 
methodologies, along with improvements in computational hardware, have 
transformed this once formidable research challenge into practical techniques 
that facilitate billions of users. Despite the significant progress made, 
particularly in speed and scalability, program synthesizers still face 
challenges in faithfully realizing human requirements due to their intrinsic 
diversity, ambiguity, and complexity.

This thesis aims to enhance the communication of human requirements to 
program synthesizers by studying the gap between them and proposing 
user-friendly interaction with efficient synthesis techniques to improve the 
usability and efficiency of program synthesizers. This thesis includes the 
following three studies.

The first study introduces a programming-by-example (PBE) framework to ease 
PBE development in diverse domains. Domain-specific PBE synthesizers, such as 
Microsoft FlashFill in Excel, boost user efficiency by synthesizing programs 
to automate repetitive tasks in the application domains. However, the 
demanding requirement of implementing a PBE synthesizer hinders the adoption 
of PBE in wider application domains. Therefore, we propose Bee, a PBE 
framework with an innovative developer interface based on relational tables, 
to ease domain-specific customization. The evaluation shows Bee is more 
accessible to novice developers than the classic program synthesis frameworks 
while maintaining comparable performance.

The second study enhances the probabilistic model used in interactive program 
synthesis for resolving ambiguity in PBE, where multiple programs consistent 
with the example specification can be found. Question-answer interaction with 
users can effectively identify the user-intended program, and the question 
selection problem aims to select the questions that minimize the interaction 
effort, specifically the expected value of interaction rounds. The 
measurement of the probabilities of example-consistent programs is key to the 
question selection performance. Therefore, this study proposes using large 
language models (LLMs) to capture user intentions in the natural language 
descriptions alongside examples to yield a more user-aligned probabilistic 
model. The evaluation investigates the characteristics of different 
probabilistic models and confirms the effectiveness of our approach.

The third study empirically investigates the decoupling of code generation’s 
complexity into (a) problem-solving and (b) implementing a sketched solution 
in a programming language (PL). Using pseudocode to represent sketched 
solutions, we compare the code generation performance from pseudocode or 
problem descriptions in different PLs. Analysis of experimental results shows 
that (1) the complexity of problem-solving is the performance bottleneck, (2) 
the implementation complexity in different PLs has significant differences, 
and (3) pseudocode can preserve essential semantics of solutions and serve as 
an effective, PL-agnostic means of communication with LLMs.


Date:                   Tuesday, 29 July 2025

Time:                   3:30pm - 5:30pm

Venue:                  Room 5501
                        Lifts 25/26

Chairman:               Prof. David Edward COOK (ECON)

Committee Members:      Prof. Shing-Chi CHEUNG (Supervisor)
                        Dr. Shuai WANG
                        Prof. Raymond WONG
                        Dr. Zhiyao XIE (ECE)
                        Prof. Minxue PAN (NJU)