Pre Trained Speech Model

Pre-trained speech models leverage large datasets to learn robust representations of speech, enabling efficient adaptation to various downstream tasks like speech recognition, emotion recognition, and speaker verification. Current research emphasizes improving these models' efficiency and robustness, focusing on techniques like adapter tuning, prompt engineering, and innovative training strategies such as incorporating textual data or brain activations to refine representations. This work is significant because it reduces the reliance on extensive labeled data, improves performance on low-resource languages and challenging acoustic conditions, and facilitates the development of more versatile and accurate speech processing systems across diverse applications.

Papers

June 14, 2023

Feature Normalization for Fine-tuning Self-Supervised Models in Speech Enhancement
Hejung Yang, Hong-Goo Kang
Speech Enhancement Supervised Fine Tuning Speech Generation Pre Trained Representation Pre Trained Speech Model Feature Normalization Speech Enhancement Task

June 13, 2023

A Novel Scheme to classify Read and Spontaneous Speech
Sunil Kumar Kopparapu
Speech Analysis Read V Spontaneous Speech Audio Recording Pre Trained Speech Model Recognition System Calysto Scheme

June 9, 2023

Developing Speech Processing Pipelines for Police Accountability
Anjalie Field, Prateek Verma, Nay San, Jennifer L. Eberhardt, Dan Jurafsky
Automatic Speech Recognition Automatic Speech Recognition Performance Pre Trained Speech Model Speech Detection Processing Pipeline

June 8, 2023

PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Speech Models
Tiantian Feng, Shrikanth Narayanan
Emotion Recognition Greater Public Use Speech Emotion Recognition Audio Representation Large Pre Trained Model Parameter Efficient Transfer Learning Pre Trained Speech Model

June 1, 2023

How to Estimate Model Transferability of Pre-Trained Speech Models?
Zih-Ching Chen, Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Shuo-Yiin Chang, Rohit Prabhavalkar, Hung-yi Lee, Tara N. Sainath
Language Model Self Supervised Speech Model Pre Trained Speech Model Speech Foundation Model Model Transferability Transferability Score

May 31, 2023

Zero-Shot Automatic Pronunciation Assessment
Hongfu Liu, Mingqian Shi, Ye Wang
Automatic Speech Recognition Speech Data Pre Trained Speech Model Automatic Pronunciation Assessment

May 25, 2023

INTapt: Information-Theoretic Adversarial Prompt Tuning for Enhanced Non-Native Speech Recognition
Eunseop Yoon, Hee Suk Yoon, John Harvill, Mark Hasegawa-Johnson, Chang D. Yoo
Automatic Speech Recognition Speech Recognition Pre Trained Speech Model Self Supervised Speech Representation Learning Accent Recognition Adversarial Prompting

May 22, 2023

Textually Pretrained Speech Language Models
Michael Hassid, Tal Remez, Tu Anh Nguyen, Itai Gat, Alexis Conneau, Felix Kreuk, Jade Copet, Alexandre Defossez, Gabriel Synnaeve, Emmanuel Dupoux, Roy Schwartz, Yossi Adi
Pre Training Speech Language Model Pre Trained Speech Model Natural Language Model Cold Start

February 24, 2023

Pre-Finetuning for Few-Shot Emotional Speech Recognition
Maximillian Chen, Zhou Yu
Speech Model Speaker Adaptation Emotional Speech Pre Trained Speech Model Speech Emotion Corpus

November 9, 2022

Accidental Learners: Spoken Language Identification in Multilingual Self-Supervised Models
Travis M. Bartley, Fei Jia, Krishna C. Puvvada, Samuel Kriman, Boris Ginsburg
Self Supervised Language Identification Pre Trained Speech Model Multilingual Pretraining Passive Learning

November 2, 2022

Fast and efficient speech enhancement with variational autoencoders
Mostafa Sadeghi, Romain Serizel
Variational Inference Speech Enhancement Variational Autoencoders Total Variation Pre Trained Speech Model Unsupervised Speech Enhancement

November 1, 2022

T5lephone: Bridging Speech and Text Self-supervised Models for Spoken Language Understanding via Phoneme level T5
Chan-Jan Hsu, Ho-Lam Chung, Hung-yi Lee, Yu Tsao
Language Model Self Supervised Spoken Language Understanding Bridging Text Pre Trained Speech Model Language Model Output Phonetic Information

October 31, 2022

Multilingual Speech Emotion Recognition With Multi-Gating Mechanism and Neural Architecture Search
Zihan Wang, Qi Meng, HaiFeng Lan, XinRui Zhang, KeHao Guo, Akshat Gupta
Neural Architecture Search Speech Emotion Recognition Language Specific Pre Trained Speech Model Gating Mechanism Emotion Category

July 2, 2022

Speech Emotion: Investigating Model Representations, Multi-Task Learning and Knowledge Distillation
Vikramjit Mitra, Hsiang-Yun Sherry Chien, Vasudha Kowtha, Joseph Yitan Cheng, Erdrin Azemi
Knowledge Distillation Multi Task Learning Pre Trained Speech Model Speech Emotion Valence Prediction Model Representation Emotion Estimation Dimensional Emotion Recognition

July 1, 2022

Improving Low-Resource Speech Recognition with Pretrained Speech Models: Continued Pretraining vs. Semi-Supervised Training
Mitchell DeHaven, Jayadev Billa
Automatic Speech Recognition Audio Data Self Supervised Transformer Pre Trained Speech Model Semi Supervised Training Low Resource Speech Recognition

June 16, 2022

Automatic Prosody Annotation with Pre-Trained Text-Speech Model
Ziqian Dai, Jianwei Yu, Yan Wang, Nuo Chen, Yanyao Bian, Guangzhi Li, Deng Cai, Dong Yu
Text to Speech Prosodic Feature Pre Trained Speech Model Prosody Modeling

April 25, 2022

Speech Detection For Child-Clinician Conversations In Danish For Low-Resource In-The-Wild Conditions: A Case Study
Sneha Das, Nicole Nadine Lønfeldt, Anne Katrine Pagsberg, Line. H. Clemmensen
Automatic Speech Recognition Low Resource Speech Model Pre Trained Speech Model Speech Detection Speech Processing Task Challenging Environment Atypical Speech

April 22, 2022

WaBERT: A Low-resource End-to-end Model for Spoken Language Understanding and Speech-to-BERT Alignment
Lin Yao, Jianfei Song, Ruizhuo Xu, Yingfang Yang, Zijian Chen, Yafeng Deng
Automatic Speech Recognition Pre Trained Ticket BERT Spoken Language Understanding Speech Model Pre Trained Speech Model End to End Model

April 7, 2022

Three-Module Modeling For End-to-End Spoken Language Understanding Using Pre-trained DNN-HMM-Based Acoustic-Phonetic Model
Nick J. C. Wang, Lu Wang, Yandan Sun, Haimei Kang, Dejun Zhang
Automatic Speech Recognition Spoken Language Understanding Pre Trained Speech Model Intent Classifier End to End Spoken Language Layered Approach

December 11, 2021

Perceptual Loss with Recognition Model for Single-Channel Enhancement and Robust ASR
Peter Plantinga, Deblin Bagchi, Eric Fosler-Lussier
Speech Recognition Speech Enhancement Pre Trained Speech Model Recognition Model Perceptual Loss Single Channel Speech Enhancement

Pre Trained Speech Model

Papers

Feature Normalization for Fine-tuning Self-Supervised Models in Speech Enhancement

A Novel Scheme to classify Read and Spontaneous Speech

Developing Speech Processing Pipelines for Police Accountability

PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Speech Models

How to Estimate Model Transferability of Pre-Trained Speech Models?

Zero-Shot Automatic Pronunciation Assessment

INTapt: Information-Theoretic Adversarial Prompt Tuning for Enhanced Non-Native Speech Recognition

Textually Pretrained Speech Language Models

Pre-Finetuning for Few-Shot Emotional Speech Recognition

Accidental Learners: Spoken Language Identification in Multilingual Self-Supervised Models

Fast and efficient speech enhancement with variational autoencoders

T5lephone: Bridging Speech and Text Self-supervised Models for Spoken Language Understanding via Phoneme level T5

Multilingual Speech Emotion Recognition With Multi-Gating Mechanism and Neural Architecture Search

Speech Emotion: Investigating Model Representations, Multi-Task Learning and Knowledge Distillation

Improving Low-Resource Speech Recognition with Pretrained Speech Models: Continued Pretraining vs. Semi-Supervised Training

Automatic Prosody Annotation with Pre-Trained Text-Speech Model

Speech Detection For Child-Clinician Conversations In Danish For Low-Resource In-The-Wild Conditions: A Case Study

WaBERT: A Low-resource End-to-end Model for Spoken Language Understanding and Speech-to-BERT Alignment

Three-Module Modeling For End-to-End Spoken Language Understanding Using Pre-trained DNN-HMM-Based Acoustic-Phonetic Model

Perceptual Loss with Recognition Model for Single-Channel Enhancement and Robust ASR