Device Automatic Speech Recognition
On-device automatic speech recognition (ASR) aims to perform speech-to-text transcription directly on user devices, enhancing privacy and reducing latency compared to cloud-based systems. Current research focuses on optimizing model architectures like RNN-Ts and Conformers for efficiency and low power consumption, employing techniques such as model quantization, weight pruning, and transfer learning to minimize resource usage while maintaining accuracy. These advancements are crucial for deploying ASR on resource-constrained devices like smartphones and wearables, impacting various applications from virtual assistants to accessibility tools. Furthermore, research is exploring personalized models and efficient methods for incorporating punctuation, capitalization, and even emojis into the transcription process.
Papers
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing
Yonggan Fu, Yang Zhang, Kaizhi Qian, Zhifan Ye, Zhongzhi Yu, Cheng-I Lai, Yingyan Celine Lin
Factorized Blank Thresholding for Improved Runtime Efficiency of Neural Transducers
Duc Le, Frank Seide, Yuhao Wang, Yang Li, Kjell Schubert, Ozlem Kalinli, Michael L. Seltzer