Speech Data
Speech data research focuses on developing and improving methods for analyzing and utilizing spoken language, primarily for applications like automatic speech recognition (ASR), speech synthesis, and speaker verification. Current research emphasizes the development of robust models, often employing deep learning architectures such as Conformers and Transformers, trained on massive multilingual datasets, including both labeled and unlabeled data, sometimes augmented with synthetic speech. This field is crucial for advancing human-computer interaction, improving accessibility for individuals with disabilities, and enabling new diagnostic tools in healthcare, particularly for mental health and neurological disorders.
Papers
Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement
Wangyou Zhang, Kohei Saijo, Jee-weon Jung, Chenda Li, Shinji Watanabe, Yanmin Qian
Promoting Fairness and Diversity in Speech Datasets for Mental Health and Neurological Disorders Research
Eleonora Mancini, Ana Tanevska, Andrea Galassi, Alessio Galatolo, Federico Ruggeri, Paolo Torroni