Parallel Audio Generation

Parallel audio generation aims to create high-quality audio significantly faster than traditional autoregressive methods by generating multiple audio segments concurrently. Current research focuses on developing efficient model architectures, such as transformer-based models and those employing masked language modeling techniques, to improve both speed and audio fidelity. These advancements leverage large datasets and refined training strategies, leading to substantial improvements in generation speed and quality compared to autoregressive baselines, with implications for applications requiring real-time or high-throughput audio synthesis.

Papers

June 27, 2024

Taming Data and Transformers for Audio Generation
Moayed Haji-Ali, Willi Menapace, Aliaksandr Siarohin, Guha Balakrishnan, Sergey Tulyakov, Vicente Ordonez
Transformer Megatron Decepticons Audio Captioning Audio Generation Audio Synthesis Audio Text Parallel Audio Generation

January 2, 2024

Efficient Parallel Audio Generation using Group Masked Language Modeling
Myeonghun Jeong, Minchan Kim, Joun Yeop Lee, Nam Soo Kim
Masked Language Modeling Text to Audio Generation Codec Language Model Parallel Audio Generation

May 16, 2023

SoundStorm: Efficient Parallel Audio Generation
Zalán Borsos, Matt Sharifi, Damien Vincent, Eugene Kharitonov, Neil Zeghidour, Marco Tagliasacchi
Audio Driven Audio Generation Non Autoregressive Neural Audio Parallel Audio Generation

Parallel Audio Generation

Papers

Taming Data and Transformers for Audio Generation

Efficient Parallel Audio Generation using Group Masked Language Modeling

SoundStorm: Efficient Parallel Audio Generation