Speech Transcription

Speech transcription, the automated conversion of spoken language into text, aims to create accurate and efficient systems for diverse applications. Current research focuses on improving the speed and accuracy of transformer-based models like Whisper, addressing challenges posed by noisy or diverse audio data, and exploring end-to-end approaches that integrate speech recognition with other tasks such as summarization, translation, and emotion recognition. These advancements have significant implications for accessibility (e.g., subtitling, transcription of legal proceedings), healthcare (e.g., Alzheimer's diagnosis), and language learning, particularly in low-resource settings where large labeled datasets are scarce.

Papers