Speech Analysis
Speech analysis is a rapidly evolving field focused on understanding and manipulating spoken language using computational methods, aiming to improve human-computer interaction and address challenges in healthcare and other domains. Current research emphasizes developing robust models, often based on transformer networks and neural codecs, for tasks such as speech recognition, emotion detection, and generation, including handling multi-speaker scenarios and low-resource languages. These advancements have significant implications for applications ranging from improved accessibility for individuals with speech impairments to more natural and intuitive interfaces for various technologies, as well as enabling new diagnostic tools in healthcare.
Papers
SECP: A Speech Enhancement-Based Curation Pipeline For Scalable Acquisition Of Clean Speech
Adam Sabra, Cyprian Wronka, Michelle Mao, Samer Hijazi
Significance of Chirp MFCC as a Feature in Speech and Audio Applications
S. Johanan Joysingh, P. Vijayalakshmi, T. Nagarajan
On the relationship between speech and hearing
Srinivasan Umesh, Leon Cohen, Douglas Nelson
SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data
Hsuan-Fu Wang, Yi-Jen Shih, Heng-Jui Chang, Layne Berry, Puyuan Peng, Hung-yi Lee, Hsin-Min Wang, David Harwath
Analysis of Self-Supervised Speech Models on Children's Speech and Infant Vocalizations
Jialu Li, Mark Hasegawa-Johnson, Nancy L. McElwain