Towards Bridging Human Requirements and Program Synthesis

PhD Thesis Proposal Defence


Title: "Towards Bridging Human Requirements and Program Synthesis"

by

Mr. Jiarong WU


Abstract:

Program synthesis aims to synthesize programs meeting the user-given 
specifications such as input-output examples and natural language (NL) 
descriptions. Recent advancements in both symbolic and neural synthesis 
methodologies, along with improvements in computational hardware, have 
transformed this once formidable research challenge into practical techniques 
that facilitate billions of users. Despite the significant progress made, 
particularly in speed and scalability, program synthesizers still face 
challenges in faithfully realizing human requirements due to their intrinsic 
diversity, ambiguity, and complexity.

This thesis aims to enhance the mechanisms to communicate human requirements 
to program synthesizers and consists of the following three studies.

The first study introduces a programming-by-example (PBE) framework to ease 
PBE development in diverse domains. Domain-specific PBE synthesizers, such as 
Microsoft FlashFill in Excel, boost user efficiency by synthesizing programs 
to automate repetitive tasks in the application domains. Although the 
application domains are diverse, the demanding requirement of implementing a 
PBE synthesizer hinders the wider adoption. Therefore, Bee, a PBE framework 
based on an innovative relational tables-based developer interface is 
proposed to ease the customization for diverse application domains. The 
evaluation shows Bee is more accessible to novice developers than the classic 
program synthesis frameworks while maintaining comparable performance.

The second study enhances the probabilistic model used in interactive program 
synthesis for resolving ambiguity. PBE synthesizers usually find multiple 
programs that are consistent with the example specification, and one of the 
effective approaches to identify the user-intended one is through 
question-answer interaction. The questions selected for users to answer is 
crucial to the interaction effort measured by the number of rounds, and 
selecting the optimal questions depends on a probabilistic distribution of 
programs. This study proposes using large language models (LLMs) to capture 
user intentions in the NL descriptions alongside examples to yield a more 
user-aligned probabilistic model. The evaluation shows the effectiveness of 
the approach in reducing user interaction efforts.

The third study empirically investigates the effectiveness of pseudocode as 
an intermediate representation (IR) for isolating the complexity of problem 
solving and guiding LLMs in synthesizing executable and correct program 
implementations in different programming languages. Experimental results 
confirm that pseudocode not only preserves essential semantics but also 
provides a simple and effective means of communication with LLMs, shedding 
light on program synthesis from complex requirements via the pseudocode IR.


Date:                   Tuesday, 29 April 2025

Time:                   8:00am - 10:00am

Venue:                  Room 2130A
                        Lift 19

Committee Members:      Prof. Shing-Chi Cheung (Supervisor)
                        Dr. Shuai Wang (Chairperson)
                        Dr. May Fung
                        Dr. Dongdong She