More about HKUST
A Survey on Learning-Based Code Translation
PhD Qualifying Examination
Title: "A Survey on Learning-Based Code Translation"
by
Mr. Songqiang CHEN
Abstract:
Code translation, also known as transpilation, aims to convert source code
from one programming language into another while preserving semantics. It
has gained significant attention due to its pivotal role in various software
development tasks such as legacy system migration and cross-platform
development. Learning-based code translation approaches demonstrate clear
advantages over traditional rule-based methods, offering superior
adaptability and producing more fluent, human-like translations. Since
emerging around 2013, research in learning-based code translation has
rapidly evolved with advances in deep learning and large language models,
bringing the field closer to the goal of accurate automated program
conversion across diverse programming languages. This survey presents a
review of learning-based code translation research in three key dimensions:
methodological enhancements in code representations and auxiliary
information integration; empirical studies and post-processing techniques
that systematically evaluate translation performance and develop automated
debugging strategies; and benchmark construction ranging from parallel
corpora to large-scale evaluation datasets. Our analysis highlights
significant progress in code representation learning while identifying
persistent challenges in handling knowledge-intensive translations and
intricate language features. We also outline research opportunities in
revisiting the integration of traditional approaches, incorporating
underexplored contextual signals, further automating debugging procedures,
and constructing effective evaluation frameworks for large-scale translation
scenarios.
Date: Friday, 27 June 2025
Time: 10:00am - 11:30am
Venue: Room 5501
Lifts 25/26
Committee Members: Prof. Shing-Chi Cheung (Supervisor)
Dr. Dimitris Papadopoulos (Chairperson)
Dr. Jiasi Shen