Meeting Transcription

Meeting transcription research aims to automatically generate accurate text from multi-speaker conversations, focusing on robust performance across diverse acoustic conditions and speaker numbers. Current efforts concentrate on improving speaker diarization (identifying who speaks when) and speech separation techniques, often employing deep learning models like graph convolutional networks and leveraging both supervised and semi-supervised learning approaches with large-scale datasets. These advancements are crucial for improving accessibility to meeting recordings and enabling efficient analysis of spoken interactions in various applications, from automated meeting summarization to improved human-computer interaction.

Papers