Speech Detection
Speech detection research focuses on accurately identifying and classifying speech segments within audio, encompassing tasks like voice activity detection, speaker diarization, and the detection of specific speech characteristics (e.g., stuttering, synthetic speech). Current research emphasizes robust models against noise and reverberation, often employing deep learning architectures such as convolutional and recurrent neural networks, large language models, and techniques like knowledge distillation and transfer learning to improve accuracy and efficiency. These advancements have significant implications for various applications, including clinical diagnosis (e.g., detecting speech disorders), enhancing accessibility for individuals with communication challenges, and improving the accuracy of voice-based systems in noisy environments.
Papers
Enhancing Anti-spoofing Countermeasures Robustness through Joint Optimization and Transfer Learning
Yikang Wang, Xingming Wang, Hiromitsu Nishizaki, Ming Li
Confidence Estimation for Automatic Detection of Depression and Alzheimer's Disease Based on Clinical Interviews
Wen Wu, Chao Zhang, Philip C. Woodland