Speech Recognition Model
Speech recognition models aim to accurately transcribe spoken language into text, driving research into more efficient and robust systems. Current efforts focus on improving model efficiency through techniques like mixture-of-experts, low-rank adaptation, and dynamic layer skipping, often within transformer-based or connectionist temporal classification (CTC) architectures. These advancements are crucial for deploying speech recognition on resource-constrained devices and enhancing performance in diverse acoustic conditions and languages, impacting fields ranging from voice assistants to medical transcription. Furthermore, research emphasizes improving robustness against adversarial attacks and handling challenges like dialect variation and low-resource languages.
Papers
On the Robustness of Arabic Speech Dialect Identification
Peter Sullivan, AbdelRahim Elmadany, Muhammad Abdul-Mageed
SlothSpeech: Denial-of-service Attack Against Speech Recognition Models
Mirazul Haque, Rutvij Shah, Simin Chen, Berrak Şişman, Cong Liu, Wei Yang
Automatic Data Augmentation for Domain Adapted Fine-Tuning of Self-Supervised Speech Representations
Salah Zaiem, Titouan Parcollet, Slim Essid