ReelWave: A Multi-Agent Framework Toward Professional Movie Sound Generation

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering

Final Year Thesis Oral Defense

Title: "ReelWave: A Multi-Agent Framework Toward Professional Movie Sound 
Generation"

by

WANG Zixuan

Abstract:

Film production is an important application for generative audio, where 
richer context is provided through multiple scenes. In ReelWave, we propose 
a multi-agent framework for audio generation inspired by the professional 
movie production process. We first capture semantic and temporal 
synchronized "on-screen" sound by training a prediction model that predicts 
three interpretable time-varying audio control signals comprising loudness, 
pitch, and timbre. These three parameters are subsequently specified as 
conditions by a cross-attention module. Then, our framework infers 
"off-screen" sound to complement the generation through cooperative 
interaction between communicative agents. Each agent takes up specific roles 
similar to the movie production team and is supervised by an agent called 
the director. Besides, we investigate when the conditional video consists of 
multiple scenes, a case frequently seen in videos extracted from movies of 
considerable length. Consequently, our framework can capture a richer 
context of audio generation conditioned on video clips extracted from 
movies.


Date            : 9 April 2025 (Wednesday)

Time            : 15:15 - 15:55

Venue           : Room 5501 (near lifts 25/26), HKUST

Advisor         : Prof. TANG Chi-Keung

2nd Reader      : Dr. CHEN Qifeng