Speech Segment
Speech segment analysis focuses on extracting meaningful information from discrete portions of spoken audio, aiming to improve various speech-related applications. Current research emphasizes developing robust models, such as transformer networks and graph convolutional networks, to handle challenges like noise, speaker variability, and overlapping speech, often incorporating multimodal data (audio-visual) and self-supervised learning techniques for improved performance. These advancements are driving progress in diverse fields, including mental health assessment, speech-to-speech translation, and speaker diarization, by enabling more accurate and efficient processing of spoken language.
Papers
DWFormer: Dynamic Window transFormer for Speech Emotion Recognition
Shuaiqi Chen, Xiaofen Xing, Weibin Zhang, Weidong Chen, Xiangmin Xu
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations
Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Yu Zhang, Wei Han, Ankur Bapna, Michiel Bacchiani