Disfluency Detection
Disfluency detection focuses on identifying and correcting interruptions in speech, such as repetitions or hesitations, improving the accuracy and efficiency of speech processing systems. Current research emphasizes multimodal approaches, combining acoustic and visual data with advanced architectures like transformers and graph convolutional networks, to enhance detection accuracy and address data scarcity through techniques like synthetic data generation. This work is crucial for improving automatic speech recognition, conversational AI, and applications in speech therapy, as accurate disfluency detection facilitates better natural language understanding and more effective human-computer interaction.
Papers
Artificial Disfluency Detection, Uh No, Disfluency Generation for the Masses
T. Passali, T. Mavropoulos, G. Tsoumakas, G. Meditskos, S. Vrochidis
Streaming Joint Speech Recognition and Disfluency Detection
Hayato Futami, Emiru Tsunoo, Kentaro Shibata, Yosuke Kashiwagi, Takao Okuda, Siddhant Arora, Shinji Watanabe