Voice Activity Detection
Voice activity detection (VAD) aims to accurately identify speech segments within audio recordings, a crucial preprocessing step for numerous applications like speech recognition and speaker diarization. Current research emphasizes improving VAD robustness in challenging acoustic conditions (noise, reverberation, overlapping speech) using lightweight neural networks (e.g., convolutional, recurrent, and transformer architectures), often incorporating multi-channel processing and self-supervised learning techniques. These advancements are driving improvements in real-time applications, particularly in areas like hands-free communication, ecological monitoring, and personalized audio processing, where efficient and accurate speech detection is paramount.
Papers
Multi-Speaker and Wide-Band Simulated Conversations as Training Data for End-to-End Neural Diarization
Federico Landini, Mireia Diez, Alicia Lozano-Diez, Lukáš Burget
Low Pass Filtering and Bandwidth Extension for Robust Anti-spoofing Countermeasure Against Codec Variabilities
Yikang Wang, Xingming Wang, Hiromitsu Nishizaki, Ming Li
Multitask Detection of Speaker Changes, Overlapping Speech and Voice Activity Using wav2vec 2.0
Marie Kunešová, Zbyněk Zajíc
Smart Speech Segmentation using Acousto-Linguistic Features with look-ahead
Piyush Behre, Naveen Parihar, Sharman Tan, Amy Shah, Eva Sharma, Geoffrey Liu, Shuangyu Chang, Hosam Khalil, Chris Basoglu, Sayan Pathak