Speech Analysis
Speech analysis is a rapidly evolving field focused on understanding and manipulating spoken language using computational methods, aiming to improve human-computer interaction and address challenges in healthcare and other domains. Current research emphasizes developing robust models, often based on transformer networks and neural codecs, for tasks such as speech recognition, emotion detection, and generation, including handling multi-speaker scenarios and low-resource languages. These advancements have significant implications for applications ranging from improved accessibility for individuals with speech impairments to more natural and intuitive interfaces for various technologies, as well as enabling new diagnostic tools in healthcare.
Papers
Probing mental health information in speech foundation models
Marc de Gennes, Adrien Lesage, Martin Denais, Xuan-Nga Cao, Simon Chang, Pierre Van Remoortere, Cyrille Dakhlia, Rachid Riad
Speech to Reality: On-Demand Production using Natural Language, 3D Generative AI, and Discrete Robotic Assembly
Alexander Htet Kyaw, Se Hwan Jeon, Miana Smith, Neil Gershenfeld
Speech2rtMRI: Speech-Guided Diffusion Model for Real-time MRI Video of the Vocal Tract during Speech
Hong Nguyen, Sean Foley, Kevin Huang, Xuan Shi, Tiantian Feng, Shrikanth Narayanan
The ParlaSpeech Collection of Automatically Generated Speech and Text Datasets from Parliamentary Proceedings
Nikola Ljubešić, Peter Rupnik, Danijel Koržinek
Speechworthy Instruction-tuned Language Models
Hyundong Cho, Nicolaas Jedema, Leonardo F.R. Ribeiro, Karishma Sharma, Pedro Szekely, Alessandro Moschitti, Ruben Janssen, Jonathan May