Synthesized Speech
Synthesized speech research focuses on creating realistic and natural-sounding artificial speech, primarily for applications like voice assistants, audiobooks, and accessibility tools. Current efforts concentrate on improving the naturalness and expressiveness of synthesized speech, often using deep learning models like GANs, diffusion models, and transformers, and addressing challenges such as detecting synthetic speech (deepfakes) and mitigating biases in these detection systems. This field is crucial for advancing human-computer interaction, improving accessibility technologies, and combating the malicious use of synthetic audio in fraud and disinformation.
Papers
Text Generation with Speech Synthesis for ASR Data Augmentation
Zhuangqun Huang, Gil Keren, Ziran Jiang, Shashank Jain, David Goss-Grubbs, Nelson Cheng, Farnaz Abtahi, Duc Le, David Zhang, Antony D'Avirro, Ethan Campbell-Taylor, Jessie Salas, Irina-Elena Veliche, Xi Chen
Towards generalizing deep-audio fake detection networks
Konstantin Gasenzer, Moritz Wolter
The defender's perspective on automatic speaker verification: An overview
Haibin Wu, Jiawen Kang, Lingwei Meng, Helen Meng, Hung-yi Lee