Environmental Sound Synthesis

Environmental sound synthesis aims to generate realistic soundscapes from various input modalities, moving beyond traditional audio-based methods. Current research explores novel input types, such as vocal imitations and visual onomatopoeias, integrated with sound event labels, leveraging models like vector quantized encoders and Tacotron-based decoders to control synthesized sound characteristics like pitch and rhythm. The field is actively developing robust evaluation methodologies, recognizing the need for both objective and subjective assessments to ensure synthesized sounds accurately reflect the input information and user perception. This work has implications for various applications, including virtual and augmented reality, video game development, and sound design for film and other media.

Papers