Pre Trained Automatic Speech Recognition

Pre-trained automatic speech recognition (ASR) leverages large-scale models trained on massive datasets to achieve high accuracy in speech-to-text conversion, focusing on improving robustness and efficiency for diverse applications. Current research emphasizes adapting these pre-trained models to various domains (e.g., accented speech, noisy environments, low-resource languages) using techniques like data augmentation, knowledge distillation, and test-time adaptation, often incorporating transformer-based architectures and generative adversarial networks. This work is significant because it enables more accurate and efficient speech processing across a wider range of scenarios, impacting fields such as voice assistants, healthcare, and legal transcription.

Papers

February 28, 2023

Exploring Self-supervised Pre-trained ASR Models For Dysarthric and Elderly Speech Recognition
Shujie Hu, Xurong Xie, Zengrui Jin, Mengzhe Geng, Yi Wang, Mingyu Cui, Jiajun Deng, Xunying Liu, Helen Meng
Speech Corpus Dysarthric Speech Pre Trained Automatic Speech Recognition

February 20, 2023

A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker One
Lingwei Meng, Jiawen Kang, Mingyu Cui, Yuejiao Wang, Xixin Wu, Helen Meng
Automatic Speech Recognition Multi Speaker Pre Trained Automatic Speech Recognition Multi Separator Problem

January 19, 2023

From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition
Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Rohit Prabhavalkar, Tara N. Sainath, Trevor Strohman
Automatic Speech Recognition Automatic Speech Recognition Model Pre Trained Automatic Speech Recognition Multilingual Speech Recognition Model Reprogramming

November 29, 2022

Better Transcription of UK Supreme Court Hearings
Hadeel Saadany, Catherine Breslin, Constantin Orăsan, Sophie Walker
Natural Language Processing Pre Trained Automatic Speech Recognition Speech Transcription Transcription Accuracy

October 26, 2022

End-to-End Speech to Intent Prediction to improve E-commerce Customer Support Voicebot in Hindi and English
Abhinav Goyal, Anupam Singh, Nikesh Garera
End to End Indian Language Customer Service Intent Classification Intent Prediction Pre Trained Automatic Speech Recognition Stage Pipeline Multi Intent Spoken Language Understanding

June 22, 2022

A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement
Or Tal, Moshe Mandel, Felix Kreuk, Yossi Adi
Speech Enhancement Speech Processing Systematic Comparison Pre Trained Automatic Speech Recognition Speech Enhancement Model

March 31, 2022

CTA-RNN: Channel and Temporal-wise Attention RNN Leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition
Chengxin Chen, Pengyuan Zhang
Automatic Speech Recognition Speech Emotion Recognition Temporal Attention Mt RNN Channel Quality Pre Trained Automatic Speech Recognition RNN Architecture

February 18, 2022

Domain Adaptation of low-resource Target-Domain models using well-trained ASR Conformer Models
Vrunda N. Sukhadia, S. Umesh
Domain Adaptation Pre Trained Automatic Speech Recognition Domain Specific Model Domain Performance

December 9, 2021

LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading
Leyuan Qu, Cornelius Weber, Stefan Wermter
Pre Training Pre Trained Automatic Speech Recognition Lip Reading Speech Reconstruction Lip to Speech Synthesis

Pre Trained Automatic Speech Recognition

Papers

Exploring Self-supervised Pre-trained ASR Models For Dysarthric and Elderly Speech Recognition

A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker One

From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition

Better Transcription of UK Supreme Court Hearings

End-to-End Speech to Intent Prediction to improve E-commerce Customer Support Voicebot in Hindi and English

A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement

CTA-RNN: Channel and Temporal-wise Attention RNN Leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition

Domain Adaptation of low-resource Target-Domain models using well-trained ASR Conformer Models

LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading