Speech to Text

Speech-to-text (STT) research aims to accurately and efficiently convert spoken language into written text, encompassing tasks like automatic speech recognition and speech translation. Current efforts focus on improving model robustness and accuracy, particularly for low-resource languages and challenging audio conditions, often leveraging large language models (LLMs) and transformer-based architectures like Whisper and Conformer, alongside techniques like data augmentation and transfer learning. These advancements have significant implications for accessibility, enabling improved human-computer interaction and facilitating the development of more inclusive and versatile applications across various fields.

Papers