Librispeech Speech Recognition
Librispeech is a widely used benchmark dataset for automatic speech recognition (ASR), driving advancements in speech processing. Current research focuses on improving ASR performance in challenging scenarios like multi-talker environments and noisy conditions, leveraging models such as Transformers, Conformers, and neural transducers, often incorporating techniques like self-supervised learning and knowledge distillation. These efforts aim to create more robust and accurate ASR systems, with implications for various applications including voice assistants, transcription services, and accessibility technologies. The development of larger datasets, such as Libriheavy, and the exploration of techniques like curriculum learning and multi-resolution processing further enhance the capabilities and efficiency of ASR models.
Papers
CTC-aligned Audio-Text Embedding for Streaming Open-vocabulary Keyword Spotting
Sichen Jin, Youngmoon Jung, Seungjin Lee, Jaeyoung Roh, Changwoo Han, Hoonyoung Cho
Can Large Language Models Understand Spatial Audio?
Changli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Jun Zhang, Lu Lu, Zejun Ma, Yuxuan Wang, Chao Zhang
Target Speaker Extraction with Curriculum Learning
Yun Liu, Xuechen Liu, Xiaoxiao Miao, Junichi Yamagishi
LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of end-to-end ASR Models
Aleksandr Meister, Matvei Novikov, Nikolay Karpov, Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction
Jiatong Shi, Hirofumi Inaguma, Xutai Ma, Ilia Kulikov, Anna Sun