Synthetic Voice
Synthetic voice generation, aiming to create realistic artificial speech, is rapidly advancing, driven by deep learning techniques and models like WaveNet, Tacotron, and Transformer-based architectures. Current research focuses on improving the naturalness and expressiveness of synthetic voices, including emotional nuance and accurate representation of diverse accents and speakers, while simultaneously developing robust detection methods to counter the potential misuse of this technology in deepfakes and other malicious applications. The ability to both generate highly realistic synthetic speech and reliably detect it has significant implications for security, forensics, accessibility, and the entertainment industry.
Papers
Data-augmented cross-lingual synthesis in a teacher-student framework
Marcel de Korte, Jaebok Kim, Aki Kunikoshi, Adaeze Adigwe, Esther Klabbers
Manipulation of oral cancer speech using neural articulatory synthesis
Bence Mark Halpern, Teja Rebernik, Thomas Tienkamp, Rob van Son, Michiel van den Brekel, Martijn Wieling, Max Witjes, Odette Scharenborg