Speech Model

Speech models aim to represent and process spoken language computationally, enabling applications like automatic speech recognition (ASR) and text-to-speech (TTS). Current research emphasizes improving model robustness (e.g., to noise and accents), fairness (mitigating biases against marginalized language varieties), and efficiency (through techniques like knowledge distillation and low-rank adaptation), often utilizing transformer-based architectures and self-supervised learning. These advancements have significant implications for various fields, including healthcare (e.g., voice disorder detection, mental health assessment), language preservation, and human-computer interaction.

Papers

July 2, 2022

UserLibri: A Dataset for ASR Personalization Using Only Text
Theresa Breiner, Swaroop Ramaswamy, Ehsan Variani, Shefali Garg, Rajiv Mathews, Khe Chai Sim, Kilol Gupta, Mingqing Chen, Lara McConnaughey
Language Model Data Set Speech Recognition Speech Model Librispeech Speech Recognition Speech Recognition Performance

June 25, 2022

Distilling a Pretrained Language Model to a Multilingual ASR Model
Kwanghee Choi, Hyung-Min Park
Pretrained Language Model Speech Model Multilingual Automatic Speech Recognition Multilingual Speech Multilingual Automatic Speech Recognition Model Cross Lingual Language Model

June 16, 2022

DRAFT: A Novel Framework to Reduce Domain Shifting in Self-supervised Learning and Its Application to Children's ASR
Ruchao Fan, Abeer Alwan
Self Supervised Learning Automatic Speech Recognition Self Supervised Speech Data Nine Year Old Child Novel Framework Speech Model Low Resource Speech Recognition

April 25, 2022

Speech Detection For Child-Clinician Conversations In Danish For Low-Resource In-The-Wild Conditions: A Case Study
Sneha Das, Nicole Nadine Lønfeldt, Anne Katrine Pagsberg, Line. H. Clemmensen
Automatic Speech Recognition Low Resource Speech Model Pre Trained Speech Model Speech Detection Challenging Environment Speech Processing Task Atypical Speech

April 22, 2022

WaBERT: A Low-resource End-to-end Model for Spoken Language Understanding and Speech-to-BERT Alignment
Lin Yao, Jianfei Song, Ruizhuo Xu, Yingfang Yang, Zijian Chen, Yafeng Deng
Automatic Speech Recognition Pre Trained Ticket BERT Spoken Language Understanding Speech Model Pre Trained Speech Model End to End Model

April 1, 2022

Adaptive hybrid speech coding with a MLP LPC structure
Marcos Faundez-Zanuy
Adaptive Importance Nonlinear Model Speech Model Nonlinear Prediction MLP Architecture Hybrid Text

March 31, 2022

Perceptive, non-linear Speech Processing and Spiking Neural Networks
Jean Rouat, Ramin Pichevar, Stéphane Loiselle
Speech Recognition Spiking Neural Network Source Separation Speech Model Spontaneous Speech Scene Analysis

March 22, 2022

Nonlinear prediction with neural nets in ADPCM
Marcos Faundez-Zanuy, Francesc Vallverdu, Enric Monte
Neural Network Nonlinear Model Speech Model Nonlinear Prediction ADPCM Scheme

February 22, 2022

Benchmarking Generative Latent Variable Models for Speech
Jakob D. Havtorn, Lasse Borgholt, Søren Hauberg, Jes Frellsen, Lars Maaløe
Speech Analysis Speech Model Speech Domain Temporal Latent Benchmarking Generative

February 2, 2022

Keyword localisation in untranscribed speech using visually grounded speech models
Kayode Olaleye, Dan Oneata, Herman Kamper
Speech Model Localization Performance Unlabeled Speech Keyword Localisation