Automatic Speech Recognition Hypothesis

Automatic speech recognition (ASR) hypothesis research focuses on improving the accuracy and robustness of speech-to-text transcriptions, primarily by addressing errors in recognizing infrequent words or noisy audio. Current efforts leverage large language models (LLMs) for tasks like rescoring N-best ASR hypotheses, correcting errors using retrieval-augmented generation or conservative data filtering, and improving confidence estimation. These advancements are significant because they enhance the reliability of ASR systems across various applications, from voice assistants and speech emotion recognition to spoken language understanding, ultimately leading to more natural and effective human-computer interaction.

Papers

March 1, 2023

N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space
Rao Ma, Mark J. F. Gales, Kate M. Knill, Mengjie Qian
Automatic Speech Recognition Error Correction Automatic Speech Recognition Hypothesis Constrained Decoding T5 Model Automatic Speech Recognition Error Correction Depth Hypothesis

October 21, 2022

Audio-to-Intent Using Acoustic-Textual Subword Representations from End-to-End ASR
Pranay Dighe, Prateeth Nayak, Oggi Rudovic, Erik Marchi, Xiaochuan Niu, Ahmed Tewfik
End to End Acoustic Word Embeddings Automatic Speech Recognition Hypothesis End 2 End ASR Zero Shot Intent Classification

October 19, 2022

N-Best Hypotheses Reranking for Text-To-SQL Systems
Lu Zeng, Sree Hari Krishnan Parthasarathi, Dilek Hakkani-Tur
Pre Trained Language Model Text to SQL Natural Sounding Speech Automatic Speech Recognition Hypothesis

September 26, 2022

TaskMix: Data Augmentation for Meta-Learning of Spoken Intent Understanding
Surya Kant Sahu
Data Augmentation Multi Task Learning Meta Learning Human Machine Automatic Speech Recognition Hypothesis Task Diversity

March 22, 2022

Building Robust Spoken Language Understanding by Cross Attention between Phoneme Sequence and ASR Hypothesis
Zexun Wang, Yuquan Le, Yi Zhu, Yuming Zhao, Mingchao Feng, Meng Chen, Xiaodong He
Automatic Speech Recognition Cross Attention Spoken Language Understanding Downstream NLP Task Automatic Speech Recognition Hypothesis Automatic Speech Recognition Error Phoneme Sequence

Automatic Speech Recognition Hypothesis

Papers

N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space

Audio-to-Intent Using Acoustic-Textual Subword Representations from End-to-End ASR

N-Best Hypotheses Reranking for Text-To-SQL Systems

TaskMix: Data Augmentation for Meta-Learning of Spoken Intent Understanding

Building Robust Spoken Language Understanding by Cross Attention between Phoneme Sequence and ASR Hypothesis