More about HKUST
ReelWave: A Multi-Agent Framework Toward Professional Movie Sound Generation
The Hong Kong University of Science and Technology Department of Computer Science and Engineering Final Year Thesis Oral Defense Title: "ReelWave: A Multi-Agent Framework Toward Professional Movie Sound Generation" by WANG Zixuan Abstract: Film production is an important application for generative audio, where richer context is provided through multiple scenes. In ReelWave, we propose a multi-agent framework for audio generation inspired by the professional movie production process. We first capture semantic and temporal synchronized "on-screen" sound by training a prediction model that predicts three interpretable time-varying audio control signals comprising loudness, pitch, and timbre. These three parameters are subsequently specified as conditions by a cross-attention module. Then, our framework infers "off-screen" sound to complement the generation through cooperative interaction between communicative agents. Each agent takes up specific roles similar to the movie production team and is supervised by an agent called the director. Besides, we investigate when the conditional video consists of multiple scenes, a case frequently seen in videos extracted from movies of considerable length. Consequently, our framework can capture a richer context of audio generation conditioned on video clips extracted from movies. Date : 9 April 2025 (Wednesday) Time : 15:15 - 15:55 Venue : Room 5501 (near lifts 25/26), HKUST Advisor : Prof. TANG Chi-Keung 2nd Reader : Dr. CHEN Qifeng