Speech Transcription
Speech transcription, the automated conversion of spoken language into text, aims to create accurate and efficient systems for diverse applications. Current research focuses on improving the speed and accuracy of transformer-based models like Whisper, addressing challenges posed by noisy or diverse audio data, and exploring end-to-end approaches that integrate speech recognition with other tasks such as summarization, translation, and emotion recognition. These advancements have significant implications for accessibility (e.g., subtitling, transcription of legal proceedings), healthcare (e.g., Alzheimer's diagnosis), and language learning, particularly in low-resource settings where large labeled datasets are scarce.
Papers
Bringing NURC/SP to Digital Life: the Role of Open-source Automatic Speech Recognition Models
Lucas Rafael Stefanel Gris, Arnaldo Candido Junior, Vinícius G. dos Santos, Bruno A. Papa Dias, Marli Quadros Leite, Flaviane Romani Fernandes Svartman, Sandra Aluísio
Learning to Jointly Transcribe and Subtitle for End-to-End Spontaneous Speech Recognition
Jakob Poncelet, Hugo Van hamme