Speech Model

Speech models aim to represent and process spoken language computationally, enabling applications like automatic speech recognition (ASR) and text-to-speech (TTS). Current research emphasizes improving model robustness (e.g., to noise and accents), fairness (mitigating biases against marginalized language varieties), and efficiency (through techniques like knowledge distillation and low-rank adaptation), often utilizing transformer-based architectures and self-supervised learning. These advancements have significant implications for various fields, including healthcare (e.g., voice disorder detection, mental health assessment), language preservation, and human-computer interaction.

Papers

May 23, 2023

EfficientSpeech: An On-Device Text to Speech Model
Rowel Atienza
Text to Speech Speech Model Pyramid Transformer Device Use Case Neural Text to Speech

May 18, 2023

Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization
Puyuan Peng, Brian Yan, Shinji Watanabe, David Harwath
Zero Shot Speech Model Low Resource Language Pair Code Switching Speech Recognition Whisper Encoder

May 16, 2023

The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation
Mutian He, Philip N. Garner
Language Model Machine Translation Language Understanding Speech Translation Spoken Language Understanding Speech Model Multiple Meaning Professional Sign Language Interpreter

May 14, 2023

Self-supervised Neural Factor Analysis for Disentangling Utterance-level Speech Representations
Weiwei Lin, Chenhang He, Man-Wai Mak, Youzhi Tu
Automatic Speech Recognition Speech Model Utterance Representation Acoustic Unit

April 27, 2023

Understanding Shared Speech-Text Representations
Gary Wang, Kyle Kastner, Ankur Bapna, Zhehuai Chen, Andrew Rosenberg, Bhuvana Ramabhadran, Yu Zhang
Speech Representation Source Free Domain Adaptation Speech Model Joint Speech Text

March 30, 2023

Hindi as a Second Language: Improving Visually Grounded Speech with Semantically Similar Samples
Hyeonggon Ryu, Arda Senocak, In So Kweon, Joon Son Chung
Speech Analysis Indian Language Speech Model Language Learner Visually Grounded Memorized Sample

March 8, 2023

The Casual Conversations v2 Dataset
Bilal Porgali, Vítor Albiero, Jordan Ryda, Cristian Canton Ferrer, Caner Hazirbas
Computer Vision Artificial Intelligence Model Speech Model Algorithmic Bias Casual Conversation

March 3, 2023

Pre-trained Model Representations and their Robustness against Noise for Speech Emotion Analysis
Vikramjit Mitra, Vasudha Kowtha, Hsiang-Yun Sherry Chien, Erdrin Azemi, Carlos Avendano
Native Robustness Speech Recognition Industrial Disturbing Noise Speech Model Pre Trained Representation Acoustic Model Speech Emotion Acoustic Representation

February 27, 2023

Structured Pruning of Self-Supervised Pre-trained Models for Speech Recognition and Understanding
Yifan Peng, Kwangyoun Kim, Felix Wu, Prashant Sridhar, Shinji Watanabe
Speech Recognition Human Understanding Model Compression Convolutional Layer Speech Model Structured Pruning Self Supervised Speech Representation Learning Self Supervised Pre Trained Model Task Specific Structured Pruning

February 24, 2023

Pre-Finetuning for Few-Shot Emotional Speech Recognition
Maximillian Chen, Zhou Yu
Speech Model Speaker Adaptation Emotional Speech Pre Trained Speech Model Speech Emotion Corpus

February 23, 2023

ProsAudit, a prosodic benchmark for self-supervised speech models
Maureen de Seyssel, Marvin Lavechin, Hadrien Titeux, Arthur Thomas, Gwendal Virlet, Andrea Santos Revilla, Guillaume Wisniewski, Bogdan Ludusan, Emmanuel Dupoux
Prosodic Feature Speech Model Computational Linguistics Self Supervised Speech Model Lexical Knowledge

December 8, 2022

DDSupport: Language Learning Support System that Displays Differences and Distances from Model Speech
Kazuki Kawamura, Jun Rekimoto
Speech Data Speech Processing Customer Service Speech Model Qualitative Difference Language Learning Distance Information Mispronunciation Detection Pronunciation Training Level Pronunciation

December 2, 2022

Continual Learning for On-Device Speech Recognition using Disentangled Conformers
Anuj Diwan, Ching-Feng Yeh, Wei-Ning Hsu, Paden Tomasello, Eunsol Choi, David Harwath, Abdelrahman Mohamed
Continual LEArning Speech Model Librispeech Speech Recognition Conformer Generation Speaker Conditioning

November 30, 2022

Topological Data Analysis for Speech Processing
Eduard Tulchinskii, Kristian Kuznetsov, Laida Kushnareva, Daniil Cherniavskii, Serguei Barannikov, Irina Piontkovskaya, Sergey Nikolenko, Evgeny Burnaev
Topological Data Analysis Topological Feature Speech Processing Speech Model Head Transformer

November 29, 2022

Model Extraction Attack against Self-supervised Speech Models
Tsu-Yuan Hsu, Chen-An Li, Tung-Yu Wu, Hung-yi Lee
Language Model Self Supervised Speech Model Self Supervised Speech Model Model Extraction Attack Speech Supervised Learning Model

November 10, 2022

A Study on the Integration of Pre-trained SSL, ASR, LM and SLU Models for Spoken Language Understanding
Yifan Peng, Siddhant Arora, Yosuke Higuchi, Yushi Ueda, Sujay Kumar, Karthik Ganesan, Siddharth Dalmia, Xuankai Chang, Shinji Watanabe
Language Model Automatic Speech Recognition Supervised Learning Spoken Language Understanding Speech Model Self Supervised Pre Trained Model Self Supervised Speech

October 14, 2022

LeVoice ASR Systems for the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge
Yan Jia, Mi Hong, Jingyu Hou, Kailong Ren, Sifan Ma, Jin Wang, Fangzhen Peng, Yinglin Ji, Lin Yang, Junjie Wang
Speech Recognition Speech Recognition System Speech Model ASR System Recurrent Neural Network Transducer

October 3, 2022

SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Layne Berry, Hung-yi Lee, David Harwath
Language Model Pre Trained Speech Analysis Speech Model Zero Shot Retrieval Audio Visual Retrieval

September 26, 2022

The Efficacy of Self-Supervised Speech Models for Audio Representations
Tung-Yu Wu, Chen-An Li, Tzu-Han Lin, Tsu-Yuan Hsu, Hung-Yi Lee
Speech Representation Audio Representation Speech Model Self Supervised Speech Model Speech Supervised Learning Model

August 24, 2022

IndicSUPERB: A Speech Processing Universal Performance Benchmark for Indian languages
Tahir Javed, Kaushal Santosh Bhogale, Abhigyan Raman, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra
Language Model Spoken Language Understanding Indian Language Speech Model Language Dataset Universal Performance Benchmark

Speech Model

Papers

EfficientSpeech: An On-Device Text to Speech Model

Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization

The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation

Self-supervised Neural Factor Analysis for Disentangling Utterance-level Speech Representations

Understanding Shared Speech-Text Representations

Hindi as a Second Language: Improving Visually Grounded Speech with Semantically Similar Samples

The Casual Conversations v2 Dataset

Pre-trained Model Representations and their Robustness against Noise for Speech Emotion Analysis

Structured Pruning of Self-Supervised Pre-trained Models for Speech Recognition and Understanding

Pre-Finetuning for Few-Shot Emotional Speech Recognition

ProsAudit, a prosodic benchmark for self-supervised speech models

DDSupport: Language Learning Support System that Displays Differences and Distances from Model Speech

Continual Learning for On-Device Speech Recognition using Disentangled Conformers

Topological Data Analysis for Speech Processing

Model Extraction Attack against Self-supervised Speech Models

A Study on the Integration of Pre-trained SSL, ASR, LM and SLU Models for Spoken Language Understanding

LeVoice ASR Systems for the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge

SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model

The Efficacy of Self-Supervised Speech Models for Audio Representations

IndicSUPERB: A Speech Processing Universal Performance Benchmark for Indian languages