More about HKUST
Towards Bridging Human Requirements and Program Synthesis
PhD Thesis Proposal Defence Title: "Towards Bridging Human Requirements and Program Synthesis" by Mr. Jiarong WU Abstract: Program synthesis aims to synthesize programs meeting the user-given specifications such as input-output examples and natural language (NL) descriptions. Recent advancements in both symbolic and neural synthesis methodologies, along with improvements in computational hardware, have transformed this once formidable research challenge into practical techniques that facilitate billions of users. Despite the significant progress made, particularly in speed and scalability, program synthesizers still face challenges in faithfully realizing human requirements due to their intrinsic diversity, ambiguity, and complexity. This thesis aims to enhance the mechanisms to communicate human requirements to program synthesizers and consists of the following three studies. The first study introduces a programming-by-example (PBE) framework to ease PBE development in diverse domains. Domain-specific PBE synthesizers, such as Microsoft FlashFill in Excel, boost user efficiency by synthesizing programs to automate repetitive tasks in the application domains. Although the application domains are diverse, the demanding requirement of implementing a PBE synthesizer hinders the wider adoption. Therefore, Bee, a PBE framework based on an innovative relational tables-based developer interface is proposed to ease the customization for diverse application domains. The evaluation shows Bee is more accessible to novice developers than the classic program synthesis frameworks while maintaining comparable performance. The second study enhances the probabilistic model used in interactive program synthesis for resolving ambiguity. PBE synthesizers usually find multiple programs that are consistent with the example specification, and one of the effective approaches to identify the user-intended one is through question-answer interaction. The questions selected for users to answer is crucial to the interaction effort measured by the number of rounds, and selecting the optimal questions depends on a probabilistic distribution of programs. This study proposes using large language models (LLMs) to capture user intentions in the NL descriptions alongside examples to yield a more user-aligned probabilistic model. The evaluation shows the effectiveness of the approach in reducing user interaction efforts. The third study empirically investigates the effectiveness of pseudocode as an intermediate representation (IR) for isolating the complexity of problem solving and guiding LLMs in synthesizing executable and correct program implementations in different programming languages. Experimental results confirm that pseudocode not only preserves essential semantics but also provides a simple and effective means of communication with LLMs, shedding light on program synthesis from complex requirements via the pseudocode IR. Date: Tuesday, 29 April 2025 Time: 8:00am - 10:00am Venue: Room 2130A Lift 19 Committee Members: Prof. Shing-Chi Cheung (Supervisor) Dr. Shuai Wang (Chairperson) Dr. May Fung Dr. Dongdong She