Automatic Speech Recognition Model
Automatic speech recognition (ASR) models aim to accurately convert spoken language into text, a crucial task with broad applications. Current research emphasizes improving ASR performance in challenging scenarios, such as low-resource languages, accented speech, and noisy environments, often leveraging large language models (LLMs) and techniques like parameter-efficient fine-tuning and self-supervised learning. These advancements are driven by the need for more robust, accurate, and equitable ASR systems across diverse languages and speaker demographics, impacting fields ranging from healthcare to legal proceedings.
Papers
Zipformer: A faster and better encoder for automatic speech recognition
Zengwei Yao, Liyong Guo, Xiaoyu Yang, Wei Kang, Fangjun Kuang, Yifan Yang, Zengrui Jin, Long Lin, Daniel Povey
VoxArabica: A Robust Dialect-Aware Arabic Speech Recognition System
Abdul Waheed, Bashar Talafha, Peter Sullivan, AbdelRahim Elmadany, Muhammad Abdul-Mageed