Speech Recognition Model

Speech recognition models aim to accurately transcribe spoken language into text, driving research into more efficient and robust systems. Current efforts focus on improving model efficiency through techniques like mixture-of-experts, low-rank adaptation, and dynamic layer skipping, often within transformer-based or connectionist temporal classification (CTC) architectures. These advancements are crucial for deploying speech recognition on resource-constrained devices and enhancing performance in diverse acoustic conditions and languages, impacting fields ranging from voice assistants to medical transcription. Furthermore, research emphasizes improving robustness against adversarial attacks and handling challenges like dialect variation and low-resource languages.

Papers

May 12, 2023

Continual Learning for End-to-End ASR by Averaging Domain Experts
Peter Plantinga, Jaekwon Yoo, Chandra Dhir
Automatic Speech Recognition Continual LEArning End to End Catastrophic Forgetting Multiple Model Speech Recognition Model Domain Expert

March 31, 2023

Lego-Features: Exporting modular encoder features for streaming and deliberation ASR
Rami Botros, Rohit Prabhavalkar, Johan Schalkwyk, Ciprian Chelba, Tara N. Sainath, Françoise Beaufays
Encoder Side Speech Recognition Model Online Streaming Deliberation Model LEGO Object

March 20, 2023

On-the-fly Text Retrieval for End-to-End ASR Adaptation
Bolaji Yusuf, Aditya Gourav, Ankur Gandhe, Ivan Bulyko
Large Scale Speech Recognition Model Automatic Speech Recognition Hypothesis Transformer Based Automatic Speech Recognition

January 29, 2023

Exploring Attention Map Reuse for Efficient Transformer Neural Networks
Kyuhong Shim, Jungwook Choi, Wonyong Sung
Attention Map Efficient Transformer Long Range Dependency Speech Recognition Model Transformer Based Deep

November 16, 2022

Speaker Adaptation for End-To-End Speech Recognition Systems in Noisy Environments
Dominik Wagner, Ilja Baumann, Sebastian P. Bayerl, Korbinian Riedhammer, Tobias Bocklet
Speaker Embeddings Speaker Adaptation Speaker Representation Noisy Environment Speech Recognition Model End to End Speech Recognition

October 28, 2022

Dysfluencies Seldom Come Alone -- Detection as a Multi-Label Problem
Sebastian P. Bayerl, Dominik Wagner, Florian Hönig, Tobias Bocklet, Elmar Nöth, Korbinian Riedhammer
Data Detection Speech Recognition Model Reference Dataset Stuttering Sub Challenge

October 18, 2022

Towards Personalization of CTC Speech Recognition Models with Contextual Adapters and Adaptive Boosting
Saket Dingliwal, Monica Sunkara, Sravan Bodapati, Srikanth Ronanki, Jeff Farris, Katrin Kirchhoff
Connectionist Temporal Classification Speech Recognition Model Contextual Adapter Adaptive Boosting Attention Loss

October 14, 2022

Bringing NURC/SP to Digital Life: the Role of Open-source Automatic Speech Recognition Models
Lucas Rafael Stefanel Gris, Arnaldo Candido Junior, Vinícius G. dos Santos, Bruno A. Papa Dias, Marli Quadros Leite, Flaviane Romani Fernandes Svartman, Sandra Aluísio
Large Corpus Integral Role Speech Recognition Model Speech Transcription Digital Age

October 6, 2022

Damage Control During Domain Adaptation for Transducer Based Automatic Speech Recognition
Somshubra Majumdar, Shantanu Acharya, Vitaly Lavrukhin, Boris Ginsburg
Domain Adaptation Automatic Speech Recognition Model Adaptation Speech Recognition Model Transformer Encoders Google Speech Command Transformer Transducer

September 14, 2022

Federated Pruning: Improving Neural Network Efficiency with Federated Learning
Rongmei Lin, Yonghui Xiao, Tien-Ju Yang, Ding Zhao, Li Xiong, Giovanni Motta, Françoise Beaufays
Federated Learning Speech Data Speech Recognition Model Efficient Neural Network

September 9, 2022

Defend Data Poisoning Attacks on Voice Authentication
Ke Li, Cameron Baird, Dan Lin
Speaker Verification Speech Recognition Model Voice Authentication Password Strength

July 12, 2022

End-to-end speech recognition modeling from de-identified data
Martin Flechl, Shou-Chun Yin, Junho Park, Peter Skala
Speech Recognition Speech Recognition Model Synthesized Sound De Identification Diarization Performance Anonymised Data

July 11, 2022

Online Continual Learning of End-to-End Speech Recognition Models
Muqiao Yang, Ian Lane, Shinji Watanabe
Automatic Speech Recognition Continual LEArning End to End Online Continual Learning Speech Recognition Model Gradient Episodic Memory

May 19, 2022

Insights on Neural Representations for End-to-End Speech Recognition
Anna Ollerenshaw, Md Asif Jalal, Thomas Hain
Automatic Speech Recognition End to End DCU Insight AQ Neural Representation Speech Recognition Model Speech Recognition Performance

April 14, 2022

Lombard Effect for Bilingual Speakers in Cantonese and English: importance of spectro-temporal features
Maximilian Karl Scharf, Sabine Hochmuth, Lena L. N. Wong, Birger Kollmeier, Anna Warzybok
Speech Recognition Chinese Character Speech Recognition Model Speech Perception Spectro Temporal Lombard Effect Hearing Threshold

March 29, 2022

March 19, 2022

Similarity and Content-based Phonetic Self Attention for Speech Recognition
Kyuhong Shim, Wonyong Sung
Speech Recognition High Similarity Speech Recognition Model Phonetic Information

February 22, 2022

Improving CTC-based speech recognition via knowledge transferring from pre-trained language models
Keqi Deng, Songjun Cao, Yike Zhang, Long Ma, Gaofeng Cheng, Ji Xu, Pengyuan Zhang
Pre Trained Language Model Knowledge Based Connectionist Temporal Classification Speech Recognition Model

February 17, 2022

Curriculum optimization for low-resource speech recognition
Anastasia Kuznetsova, Anurag Kumar, Jennifer Drexler Fox, Francis Tyers
Speech Recognition Model Low Resource Speech Recognition Raw Audio Compression Ratio