More about HKUST
SELF-REDRAFT: Eliciting Intrinsic Exploration-Exploitation Balance in Test-Time Scaling for Code Generation
The Hong Kong University of Science and Technology Department of Computer Science and Engineering Final Year Thesis Oral Defense Title: "SELF-REDRAFT: Eliciting Intrinsic Exploration-Exploitation Balance in Test-Time Scaling for Code Generation" by CHEN Yixiang Abstract: Test-time scaling without interpreter feedback is essential for real-world code generation scenarios where test cases are not readily available. While existing paradigms often rely on either greedy exploitation (i.e., iterative refinement) or stochastic exploration (i.e., relying on sample-based voting or reranking mechanisms), the balance between these two dimensions remains underexplored. To investigate the LLM's intrinsic ability to balance exploitation and exploration, we introduce SELF-REDRAFT, a framework built upon Self-Refine that encourages the model to propose new drafts for solutions that are fundamentally flawed. Our results show that SELF-REDRAFT consistently achieves better performance than SelfRefine when converged under the same maximum number of iterations. Still, we observe that significant room for improvement remains, largely due to two core aspects of current self-redraft capabilities: constrained capacity for generating instructive feedback and fragile discriminative judgment. We also find that balancing strategies vary notably across different LLMs, reflecting distinct, model-specific behaviors. Overall, our study establishes a baseline for intrinsic exploration-exploitation balancing in test-time scaling and identifies feedback and discrimination as key areas with potential for future advances. Date : 25 April 2026 (Saturday) Time : 11:30 - 12:10 Venue : Room 2129C (near Lift 19), HKUST Advisor : Dr. FUNG May 2nd Reader : Dr. HE Junxian