Environmental Sound Synthesis

Environmental sound synthesis aims to generate realistic soundscapes from various input modalities, moving beyond traditional audio-based methods. Current research explores novel input types, such as vocal imitations and visual onomatopoeias, integrated with sound event labels, leveraging models like vector quantized encoders and Tacotron-based decoders to control synthesized sound characteristics like pitch and rhythm. The field is actively developing robust evaluation methodologies, recognizing the need for both objective and subjective assessments to ensure synthesized sounds accurately reflect the input information and user perception. This work has implications for various applications, including virtual and augmented reality, video game development, and sound design for film and other media.

Papers

April 29, 2023

Environmental sound synthesis from vocal imitations and sound event labels
Yuki Okamoto, Keisuke Imoto, Shinnosuke Takamichi, Ryotaro Nagase, Takahiro Fukumori, Yoichi Yamashita
Sound Event Synthesized Sound Environmental Sound Synthesis

October 17, 2022

Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and sound-source images
Hien Ohnaka, Shinnosuke Takamichi, Keisuke Imoto, Yuki Okamoto, Kazuki Fujii, Hiroshi Saruwatari
Synthesized Sound Environmental Sound Synthesis

August 16, 2022

How Should We Evaluate Synthesized Environmental Sounds
Yuki Okamoto, Keisuke Imoto, Shinnosuke Takamichi, Takahiro Fukumori, Yoichi Yamashita
Evaluation Method Synthesized Sound Environmental Sound Synthesis

Environmental Sound Synthesis

Papers

Environmental sound synthesis from vocal imitations and sound event labels

Visual onoma-to-wave: environmental sound synthesis from visual onomatopoeias and sound-source images

How Should We Evaluate Synthesized Environmental Sounds