More about HKUST
Enhancing and Hardening Neural Code Model
PhD Thesis Proposal Defence
Title: "Enhancing and Hardening Neural Code Model"
by
Mr. Zongjie LI
Abstract:
With the rapid advancement of deep learning technologies, neural code models
have achieved remarkable success, facilitating significant breakthroughs
across various code-related applications. Leveraging powerful computational
resources and massive training data, these models demonstrate sophisticated
capabilities in understanding, analyzing, and generating diverse programming
code. Unlike models primarily designed for natural language tasks, code
models are typically engineered for integration into various productivity
scenarios and practical development workflows. Consequently, developing
neural code models with high accuracy, reliability, and freedom from
potential intellectual property risks has become imperative.
This thesis proposal focuses on designing and developing neural code models
through three key aspects: 1) enhancing model performance through data
augmentation and architectural improvements, 2) refining output consistency
through code structure and semantic analysis, and 3) incorporating
verifiable watermarks to protect intellectual property. In our first
contribution, we present a framework that leverages compiler-generated
Intermediate Representation (IR) code for data augmentation, enabling
improved embeddings that support various downstream code applications. To
further enhance code generation capabilities, our second work introduces
CCTEST, a system that inserts context-free code snippets to detect and
rectify inconsistencies. In our third work, we exploit programming language
semantics and token distribution characteristics to embed verifiable
watermarks in model outputs, thereby enhancing model security and
intellectual property protection.
Date: Thursday, 27 March 2025
Time: 3:00pm - 5:00pm
Venue: Room 2408
Lifts 17/18
Committee Members: Dr. Shuai Wang (Supervisor)
Dr. Wei Wang (Chairperson)
Dr. Hao Chen
Dr. Dongdong She