SELF-REDRAFT: Eliciting Intrinsic Exploration-Exploitation Balance in Test-Time Scaling for Code Generation

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering

Final Year Thesis Oral Defense

Title: "SELF-REDRAFT: Eliciting Intrinsic Exploration-Exploitation Balance in 
Test-Time Scaling for Code Generation"

by

CHEN Yixiang

Abstract:

Test-time scaling without interpreter feedback is essential for real-world 
code generation scenarios where test cases are not readily available. While 
existing paradigms often rely on either greedy exploitation (i.e., iterative 
refinement) or stochastic exploration (i.e., relying on sample-based voting 
or reranking mechanisms), the balance between these two dimensions remains 
underexplored. To investigate the LLM's intrinsic ability to balance 
exploitation and exploration, we introduce SELF-REDRAFT, a framework built 
upon Self-Refine that encourages the model to propose new drafts for 
solutions that are fundamentally flawed. Our results show that SELF-REDRAFT 
consistently achieves better performance than SelfRefine when converged under 
the same maximum number of iterations. Still, we observe that significant 
room for improvement remains, largely due to two core aspects of current 
self-redraft capabilities: constrained capacity for generating instructive 
feedback and fragile discriminative judgment. We also find that balancing 
strategies vary notably across different LLMs, reflecting distinct, 
model-specific behaviors. Overall, our study establishes a baseline for 
intrinsic exploration-exploitation balancing in test-time scaling and 
identifies feedback and discrimination as key areas with potential for future 
advances.

Date            : 25 April 2026 (Saturday)

Time            : 11:30 - 12:10

Venue           : Room 2129C (near Lift 19), HKUST

Advisor         : Dr. FUNG May

2nd Reader      : Dr. HE Junxian