Audio Synthesis

Audio synthesis aims to generate realistic sounds from various inputs, such as text, images, or other audio signals, primarily focusing on improving audio quality, efficiency, and controllability. Current research heavily utilizes diffusion models, often coupled with differentiable digital signal processing (DDSP) or generative adversarial networks (GANs), to achieve high-fidelity audio generation across diverse domains like speech, music, and sound effects. These advancements have significant implications for various fields, including virtual and augmented reality, assistive technologies, and creative media production, by enabling more realistic and expressive audio experiences.

Papers