Utterance Length
Utterance length in speech and text processing is a significant research area focusing on how the length of spoken or written units impacts various downstream tasks, such as speech recognition, machine translation, and conversational AI. Current research emphasizes developing models that account for utterance length variations, often employing techniques like reinforcement learning to optimize phoneme count alignment in machine translation or data augmentation strategies to address training-test length mismatches in speech recognition. Understanding and effectively managing utterance length is crucial for improving the accuracy and efficiency of numerous applications, including automatic video dubbing, conversational agents, and clinical applications like depression detection from speech.
Papers
Universal speaker recognition encoders for different speech segments duration
Sergey Novoselov, Vladimir Volokhov, Galina Lavrentyeva
Random Utterance Concatenation Based Data Augmentation for Improving Short-video Speech Recognition
Yist Y. Lin, Tao Han, Haihua Xu, Van Tung Pham, Yerbolat Khassanov, Tze Yuang Chong, Yi He, Lu Lu, Zejun Ma