Speech Analysis
Speech analysis is a rapidly evolving field focused on understanding and manipulating spoken language using computational methods, aiming to improve human-computer interaction and address challenges in healthcare and other domains. Current research emphasizes developing robust models, often based on transformer networks and neural codecs, for tasks such as speech recognition, emotion detection, and generation, including handling multi-speaker scenarios and low-resource languages. These advancements have significant implications for applications ranging from improved accessibility for individuals with speech impairments to more natural and intuitive interfaces for various technologies, as well as enabling new diagnostic tools in healthcare.
Papers
From `Snippet-lects' to Doculects and Dialects: Leveraging Neural Representations of Speech for Placing Audio Signals in a Language Landscape
Séverine Guillaume, Guillaume Wisniewski, Alexis Michaud
speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognition
Haoyu Lu, Nan Li, Tongtong Song, Longbiao Wang, Jianwu Dang, Xiaobao Wang, Shiliang Zhang
Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation
Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino
Improving speech translation by fusing speech and text
Wenbiao Yin, Zhicheng Liu, Chengqi Zhao, Tao Wang, Jian Tong, Rong Ye