Speech Recognition System
Speech recognition systems aim to accurately transcribe spoken language into text, a crucial task with broad applications. Current research focuses on improving robustness and accuracy, particularly in challenging conditions like noisy environments, multiple speakers, and disfluent speech, often employing deep learning models such as transformers and recurrent neural networks, along with techniques like multi-task learning and data augmentation. These advancements are vital for enhancing accessibility for individuals with speech impairments, improving human-computer interaction in various domains, and enabling more sophisticated natural language processing applications. Ongoing efforts also address biases in existing systems and explore multimodal approaches integrating visual information to improve performance.
Papers
Optimized Tokenization for Transcribed Error Correction
Tomer Wullach, Shlomo E. Chazan
Personalization of CTC-based End-to-End Speech Recognition Using Pronunciation-Driven Subword Tokenization
Zhihong Lei, Ernest Pusateri, Shiyi Han, Leo Liu, Mingbin Xu, Tim Ng, Ruchir Travadi, Youyuan Zhang, Mirko Hannemann, Man-Hung Siu, Zhen Huang
AudioFool: Fast, Universal and synchronization-free Cross-Domain Attack on Speech Recognition
Mohamad Fakih, Rouwaida Kanj, Fadi Kurdahi, Mohammed E. Fouda
Leveraging Data Collection and Unsupervised Learning for Code-switched Tunisian Arabic Automatic Speech Recognition
Ahmed Amine Ben Abdallah, Ata Kabboudi, Amir Kanoun, Salah Zaiem