Speech Analysis
Speech analysis is a rapidly evolving field focused on understanding and manipulating spoken language using computational methods, aiming to improve human-computer interaction and address challenges in healthcare and other domains. Current research emphasizes developing robust models, often based on transformer networks and neural codecs, for tasks such as speech recognition, emotion detection, and generation, including handling multi-speaker scenarios and low-resource languages. These advancements have significant implications for applications ranging from improved accessibility for individuals with speech impairments to more natural and intuitive interfaces for various technologies, as well as enabling new diagnostic tools in healthcare.
Papers
SilVar: Speech Driven Multimodal Model for Reasoning Visual Question Answering and Object Localization
Tan-Hanh Pham, Hoang-Nam Le, Phu-Vinh Nguyen, Chris Ngo, Truong-Son Hy
Speech Retrieval-Augmented Generation without Automatic Speech Recognition
Do June Min, Karel Mundnich, Andy Lapastora, Erfan Soltanmohammadi, Srikanth Ronanki, Kyu Han
How good is GPT at writing political speeches for the White House?
Jacques Savoy
Bridging the Data Provenance Gap Across Text, Speech and Video
Shayne Longpre, Nikhil Singh, Manuel Cherep, Kushagra Tiwary, Joanna Materzynska, William Brannon, Robert Mahari, Manan Dey, Mohammed Hamdy, Nayan Saxena, Ahmad Mustafa Anis, Emad A. Alghamdi, Vu Minh Chien, Naana Obeng-Marnu, Da Yin, Kun Qian, Yizhi Li, Minnie Liang, An Dinh, Shrestha Mohanty, Deividas Mataciunas, Tobin South, Jianguo Zhang, Ariel N. Lee, Campbell S. Lund, Christopher Klamm, Damien Sileo, Diganta Misra, Enrico Shippole, Kevin Klyman, Lester JV Miranda, Niklas Muennighoff, Seonghyeon Ye, Seungone Kim, Vipul Gupta, Vivek Sharma, Xuhui Zhou, Caiming Xiong, Luis Villa, Stella Biderman, Alex Pentland, Sara Hooker, Jad Kabbara
SpeechPrune: Context-aware Token Pruning for Speech Information Retrieval
Yueqian Lin, Yuzhe Fu, Jingyang Zhang, Yudong Liu, Jianyi Zhang, Jingwei Sun, Hai "Helen" Li, Yiran Chen
Classification of Spontaneous and Scripted Speech for Multilingual Audio
Shahar Elisha, Andrew McDowell, Mariano Beguerisse-Díaz, Emmanouil Benetos
A lightweight and robust method for blind wideband-to-fullband extension of speech
Jan Büthe
Parole de présidents (1958-2022)
Dominique Labbé, Jacques Savoy
GPT as ghostwriter at the White House
Jacques Savoy
Wearable intelligent throat enables natural speech in stroke patients with dysarthria
Chenyu Tang, Shuo Gao, Cong Li, Wentian Yi, Yuxuan Jin, Xiaoxue Zhai, Sixuan Lei, Hongbei Meng, Zibo Zhang, Muzi Xu, Shengbo Wang, Xuhang Chen, Chenxi Wang, Hongyun Yang, Ningli Wang, Wenyu Wang, Jin Cao, Xiaodong Feng, Peter Smielewski, Yu Pan, Wenhui Song, Martin Birchall, Luigi G. Occhipint
Machine Unlearning reveals that the Gender-based Violence Victim Condition can be detected from Speech in a Speaker-Agnostic Setting
Emma Reyner-Fuentes, Esther Rituerto-Gonzalez, Carmen Pelaez-Moreno