Speech Benchmark

Speech benchmark research aims to create standardized evaluations for various speech processing tasks, enabling objective comparisons of different models and algorithms. Current efforts focus on developing comprehensive benchmarks encompassing diverse tasks (speech recognition, speaker identification, emotion recognition, etc.), exploring effective discrete audio representations (e.g., semantic tokens), and addressing challenges like low-resource scenarios and cross-lingual adaptability, often employing transformer-based architectures and self-supervised learning. These advancements are crucial for improving the robustness and generalizability of speech technologies, impacting applications ranging from clinical healthcare to personalized assistive devices.

Papers

June 20, 2024

DASB - Discrete Audio and Speech Benchmark
Pooneh Mousavi, Luca Della Libera, Jarod Duret, Artem Ploujnikov, Cem Subakan, Mirco Ravanelli
Audio Token Semantic Compression Speech Benchmark Universal Performance Benchmark

February 2, 2024

Learning Semantic Information from Raw Audio Signal Using Both Contextual and Phonetic Representations
Jaeyeon Kim, Injune Hwang, Kyogu Lee
Context Aware Google Speech Command Semantic Learning Context Retrieval Processing Pipeline Raw Audio Speech Benchmark

January 31, 2024

Revisiting speech segmentation and lexicon learning with better features
Herman Kamper, Benjamin van Niekerk
Classifier Free Guidance Sub Word Word Segmentation Acoustic Unit Speech Benchmark Speech Segmentation

November 30, 2023

Speech Understanding on Tiny Devices with A Learning Cache
Afsara Benazir, Zhiming Xu, Felix Xiaozhu Lin
Spoken Language Understanding Embedded System Speech Input Speech Benchmark

October 25, 2023

HANSEN: Human and AI Spoken Text Benchmark for Authorship Analysis
Nafis Irtiza Tripto, Adaku Uchendu, Thai Le, Mattia Setzu, Fosca Giannotti, Dongwon Lee
Human Generated Authorship Attribution Authorship Analysis Speech Benchmark

September 30, 2023

AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and General Domain ASR
Tobi Olatunji, Tejumade Afonja, Aditya Yadavalli, Chris Chinenye Emezue, Sahib Singh, Bonaventure F. P. Dossou, Joanne Osuchukwu, Salomey Osei, Atnafu Lambebo Tonja, Naome Etori, Clinton Mbataku
Automatic Speech Recognition Accented Speech Clinical Language Speech Benchmark Domain Automatic Speech Recognition

July 24, 2023

Robust Automatic Speech Recognition via WavAugment Guided Phoneme Adversarial Training
Gege Qi, Yuefeng Chen, Xiaofeng Mao, Xiaojun Jia, Ranjie Duan, Rong Zhang, Hui Xue
Adversarial Example Speech Recognition Speech Benchmark Phoneme Representation

May 22, 2023

SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented Dialogue Agents
Shuzheng Si, Wentao Ma, Haoyu Gao, Yuchuan Wu, Ting-En Lin, Yinpei Dai, Hangyu Li, Rui Yan, Fei Huang, Yongbin Li
Task Oriented Dialogue System Dialogue State Audio Text Spoken Conversation Speech Benchmark

January 2, 2023

Analysing Discrete Self Supervised Speech Representation for Spoken Language Modeling
Amitay Sicherman, Yossi Adi
Generative Language Model Speech Language Model Speech Benchmark

October 27, 2022

Conditioning and Sampling in Variational Diffusion Models for Speech Super-Resolution
Chin-Yun Yu, Sung-Lin Yeh, György Fazekas, Hao Tang
Diffusion Model Open Sampling Consistent Conditioning Low Frequency Speech Benchmark Speech Super Resolution

October 24, 2022

ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition
Sanchit Gandhi, Patrick von Platen, Alexander M. Rush
New Benchmark Speech Datasets Speech Benchmark Domain Automatic Speech Recognition

May 25, 2022

FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech
Alexis Conneau, Min Ma, Simran Khanuja, Yu Zhang, Vera Axelrod, Siddharth Dalmia, Jason Riesa, Clara Rivera, Ankur Bapna
Automatic Speech Recognition Speech Analysis Multilingual Pre Trained Model Universal Representation Speech Benchmark Shot Learning Evaluation

December 8, 2021

Audio-Visual Synchronisation in the wild
Honglie Chen, Weidi Xie, Triantafyllos Afouras, Arsha Nagrani, Andrea Vedaldi, Andrew Zisserman
Audio Visual Speech Benchmark Audio Visual Correlation