Device Automatic Speech Recognition

On-device automatic speech recognition (ASR) aims to perform speech-to-text transcription directly on user devices, enhancing privacy and reducing latency compared to cloud-based systems. Current research focuses on optimizing model architectures like RNN-Ts and Conformers for efficiency and low power consumption, employing techniques such as model quantization, weight pruning, and transfer learning to minimize resource usage while maintaining accuracy. These advancements are crucial for deploying ASR on resource-constrained devices like smartphones and wearables, impacting various applications from virtual assistants to accessibility tools. Furthermore, research is exploring personalized models and efficient methods for incorporating punctuation, capitalization, and even emojis into the transcription process.

Papers