Speech Data
Speech data research focuses on developing and improving methods for analyzing and utilizing spoken language, primarily for applications like automatic speech recognition (ASR), speech synthesis, and speaker verification. Current research emphasizes the development of robust models, often employing deep learning architectures such as Conformers and Transformers, trained on massive multilingual datasets, including both labeled and unlabeled data, sometimes augmented with synthetic speech. This field is crucial for advancing human-computer interaction, improving accessibility for individuals with disabilities, and enabling new diagnostic tools in healthcare, particularly for mental health and neurological disorders.
Papers
Speech-to-Speech Translation For A Real-world Unwritten Language
Peng-Jen Chen, Kevin Tran, Yilin Yang, Jingfei Du, Justine Kao, Yu-An Chung, Paden Tomasello, Paul-Ambroise Duquenne, Holger Schwenk, Hongyu Gong, Hirofumi Inaguma, Sravya Popuri, Changhan Wang, Juan Pino, Wei-Ning Hsu, Ann Lee
The Far Side of Failure: Investigating the Impact of Speech Recognition Errors on Subsequent Dementia Classification
Changye Li, Trevor Cohen, Serguei Pakhomov
Toward Knowledge-Driven Speech-Based Models of Depression: Leveraging Spectrotemporal Variations in Speech Vowels
Kexin Feng, Theodora Chaspari
ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild
Xuechen Liu, Xin Wang, Md Sahidullah, Jose Patino, Héctor Delgado, Tomi Kinnunen, Massimiliano Todisco, Junichi Yamagishi, Nicholas Evans, Andreas Nautsch, Kong Aik Lee