More about HKUST
A Survey on Learning-Based Code Translation
PhD Qualifying Examination Title: "A Survey on Learning-Based Code Translation" by Mr. Songqiang CHEN Abstract: Code translation, also known as transpilation, aims to convert source code from one programming language into another while preserving semantics. It has gained significant attention due to its pivotal role in various software development tasks such as legacy system migration and cross-platform development. Learning-based code translation approaches demonstrate clear advantages over traditional rule-based methods, offering superior adaptability and producing more fluent, human-like translations. Since emerging around 2013, research in learning-based code translation has rapidly evolved with advances in deep learning and large language models, bringing the field closer to the goal of accurate automated program conversion across diverse programming languages. This survey presents a review of learning-based code translation research in three key dimensions: methodological enhancements in code representations and auxiliary information integration; empirical studies and post-processing techniques that systematically evaluate translation performance and develop automated debugging strategies; and benchmark construction ranging from parallel corpora to large-scale evaluation datasets. Our analysis highlights significant progress in code representation learning while identifying persistent challenges in handling knowledge-intensive translations and intricate language features. We also outline research opportunities in revisiting the integration of traditional approaches, incorporating underexplored contextual signals, further automating debugging procedures, and constructing effective evaluation frameworks for large-scale translation scenarios. Date: Friday, 27 June 2025 Time: 10:00am - 11:30am Venue: Room 5501 Lifts 25/26 Committee Members: Prof. Shing-Chi Cheung (Supervisor) Dr. Dimitris Papadopoulos (Chairperson) Dr. Jiasi Shen