Synthesized Speech
Synthesized speech research focuses on creating realistic and natural-sounding artificial speech, primarily for applications like voice assistants, audiobooks, and accessibility tools. Current efforts concentrate on improving the naturalness and expressiveness of synthesized speech, often using deep learning models like GANs, diffusion models, and transformers, and addressing challenges such as detecting synthetic speech (deepfakes) and mitigating biases in these detection systems. This field is crucial for advancing human-computer interaction, improving accessibility technologies, and combating the malicious use of synthetic audio in fraud and disinformation.
Papers
Predicting phoneme-level prosody latents using AR and flow-based Prior Networks for expressive speech synthesis
Konstantinos Klapsas, Karolos Nikitaras, Nikolaos Ellinas, June Sig Sung, Inchul Hwang, Spyros Raptis, Aimilios Chalamandaris, Pirros Tsiakoulis
Intermediate Fine-Tuning Using Imperfect Synthetic Speech for Improving Electrolaryngeal Speech Recognition
Lester Phillip Violeta, Ding Ma, Wen-Chin Huang, Tomoki Toda