Automatic Speech Recognition

Automatic Speech Recognition (ASR) aims to accurately transcribe spoken language into text, driving research into robust and efficient models. Current efforts focus on improving accuracy and robustness through techniques like consistency regularization in Connectionist Temporal Classification (CTC), leveraging pre-trained multilingual models for low-resource languages, and integrating Large Language Models (LLMs) for enhanced contextual understanding and improved handling of diverse accents and speech disorders. These advancements have significant implications for accessibility, enabling applications in diverse fields such as healthcare, education, and human-computer interaction.

1014papers

Papers - Page 13

June 24, 2024

Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024
Sai Koneru, Thai-Binh Nguyen, Ngoc-Quan Pham, Danni Liu, Zhaolin Li, Alexander Waibel, Jan Niehues
Large Language Model Field Kit Automatic Speech Recognition Speech Translation End to End Speech Translation Machine Translation Offline Speech Translation

June 23, 2024

June 21, 2024

PI-Whisper: Designing an Adaptive and Incremental Automatic Speech Recognition System for Edge Devices
Amir Nassereldine, Dancheng Liu, Chenhui Xu, Ruiyang Qin, Yiyu Shi, Jinjun Xiong
Diverse Set Automatic Speech Recognition Whisper Model Speaker Characteristic Adaptive Importance

June 20, 2024

Speech Prefix-Tuning with RNNT Loss for Improving LLM Predictions
Murali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran, Neeraj Gaur, Zhong Meng
RNN T Loss Language Model Automatic Speech Recognition Fine Tuned Large Language Model Prefix Tuning

June 19, 2024

June 18, 2024

June 16, 2024

June 15, 2024

Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models
Ruchao Fan, Natarajan Balaji Shankar, Abeer Alwan
Supervised Finetuning Automatic Speech Recognition Speech Foundation Model Parameter Efficient Finetuning Child Speech Recognition Human Aligned Benchmark

June 14, 2024

Automatic Speech Recognition

Papers - Page 13

Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024

Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss

Decoder-only Architecture for Streaming End-to-end Speech Recognition

PI-Whisper: Designing an Adaptive and Incremental Automatic Speech Recognition System for Edge Devices

Speech Prefix-Tuning with RNNT Loss for Improving LLM Predictions

Joint vs Sequential Speaker-Role Detection and Automatic Speech Recognition for Air-traffic Control

ManWav: The First Manchu ASR Model

Bridging the Gap: Integrating Pre-trained Speech Enhancement and Recognition Models for Robust Speech Recognition

Growing Trees on Sounds: Assessing Strategies for End-to-End Dependency Parsing of Speech

Performant ASR Models for Medical Entities in Accented Speech

Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model

NAST: Noise Aware Speech Tokenization for Speech Language Models

Outlier Reduction with Gated Attention for Improved Post-training Quantization in Large Sequence-to-sequence Speech Foundation Models

Optimized Speculative Sampling for GPU Hardware Accelerators

Imperceptible Rhythm Backdoor Attacks: Exploring Rhythm Transformation for Embedding Undetectable Vulnerabilities on Speech Recognition

Automatic Speech Recognition for Biomedical Data in Bengali Language

Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models

Inclusive ASR for Disfluent Speech: Cascaded Large-Scale Self-Supervised Learning with Targeted Fine-Tuning and Data Augmentation

Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask

An efficient text augmentation approach for contextualized Mandarin speech recognition