Speech Analysis
Speech analysis is a rapidly evolving field focused on understanding and manipulating spoken language using computational methods, aiming to improve human-computer interaction and address challenges in healthcare and other domains. Current research emphasizes developing robust models, often based on transformer networks and neural codecs, for tasks such as speech recognition, emotion detection, and generation, including handling multi-speaker scenarios and low-resource languages. These advancements have significant implications for applications ranging from improved accessibility for individuals with speech impairments to more natural and intuitive interfaces for various technologies, as well as enabling new diagnostic tools in healthcare.
Papers
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios
Yihan Wu, Xu Tan, Bohan Li, Lei He, Sheng Zhao, Ruihua Song, Tao Qin, Tie-Yan Liu
MyMove: Facilitating Older Adults to Collect In-Situ Activity Labels on a Smartwatch with Speech
Young-Ho Kim, Diana Chou, Bongshin Lee, Margaret Danilovich, Amanda Lazar, David E. Conroy, Hernisa Kacorri, Eun Kyoung Choe
Importance of Different Temporal Modulations of Speech: A Tale of Two Perspectives
Samik Sadhu, Hynek Hermansky
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech
Guangyan Zhang, Kaitao Song, Xu Tan, Daxin Tan, Yuzi Yan, Yanqing Liu, Gang Wang, Wei Zhou, Tao Qin, Tan Lee, Sheng Zhao
Impact of Environmental Noise on Alzheimer's Disease Detection from Speech: Should You Let a Baby Cry?
Jekaterina Novikova
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech
Dan Lim, Sunghee Jung, Eesung Kim
Joint domain adaptation and speech bandwidth extension using time-domain GANs for speaker verification
Saurabh Kataria, Jesús Villalba, Laureano Moro-Velázquez, Najim Dehak
Generative Spoken Dialogue Language Modeling
Tu Anh Nguyen, Eugene Kharitonov, Jade Copet, Yossi Adi, Wei-Ning Hsu, Ali Elkahky, Paden Tomasello, Robin Algayres, Benoit Sagot, Abdelrahman Mohamed, Emmanuel Dupoux
Speech and the n-Back task as a lens into depression. How combining both may allow us to isolate different core symptoms of depression
Salvatore Fara, Stefano Goria, Emilia Molimpakis, Nicholas Cummins
Disentangling the Impacts of Language and Channel Variability on Speech Separation Networks
Fan-Lin Wang, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang