Acoustic Modeling

Acoustic modeling focuses on representing and manipulating audio signals, primarily for speech and music processing tasks like speech recognition, text-to-speech synthesis, and music generation. Current research emphasizes developing robust models using deep neural networks, including transformer-based architectures, normalizing flows, and diffusion models, often incorporating techniques like self-supervised learning and contextual information (e.g., from dialogue history or simulated future frames) to improve accuracy and efficiency. These advancements are driving improvements in various applications, from building more accurate and efficient speech recognition systems to creating more natural-sounding synthetic speech and music.

Papers