Speech Interaction

Speech interaction research focuses on improving human-computer communication through voice, encompassing areas like voice activity detection, keyword spotting, and conversational analysis. Current efforts leverage machine learning models, including transformer-based architectures and self-supervised learning, to enhance accuracy and efficiency in tasks such as speech transcription, turn-taking detection, and emotional state recognition from speech. These advancements are driving improvements in applications ranging from accessible transcription tools and human-robot interaction to more natural and robust virtual and augmented reality experiences. The development of efficient and low-latency systems is a key focus, enabling seamless integration into various real-world applications.

Papers