Pre Trained Automatic Speech Recognition
Pre-trained automatic speech recognition (ASR) leverages large-scale models trained on massive datasets to achieve high accuracy in speech-to-text conversion, focusing on improving robustness and efficiency for diverse applications. Current research emphasizes adapting these pre-trained models to various domains (e.g., accented speech, noisy environments, low-resource languages) using techniques like data augmentation, knowledge distillation, and test-time adaptation, often incorporating transformer-based architectures and generative adversarial networks. This work is significant because it enables more accurate and efficient speech processing across a wider range of scenarios, impacting fields such as voice assistants, healthcare, and legal transcription.
Papers
Strategies for improving low resource speech to text translation relying on pre-trained ASR models
Santosh Kesiraju, Marek Sarvas, Tomas Pavlicek, Cecile Macaire, Alejandro Ciuba
Underwater-Art: Expanding Information Perspectives With Text Templates For Underwater Acoustic Target Recognition
Yuan Xie, Jiawei Ren, Ji Xu