Phonetic Embeddings

Phonetic embeddings represent spoken sounds as numerical vectors, aiming to capture their acoustic and linguistic properties for various speech processing tasks. Current research focuses on improving embedding quality through techniques like incorporating semantic information from language models, leveraging multi-modal data (e.g., visual cues), and designing models that explicitly account for phonetic relationships and reduce error propagation. These advancements are significantly impacting speech recognition, speech synthesis, and applications like dysarthric speech reconstruction and autism diagnosis by enabling more accurate and robust systems.

Papers

September 19, 2024

DiffEditor: Enhancing Speech Editing with Semantic Enrichment and Acoustic Consistency
Yang Chen, Yuhang Jia, Shiwan Zhao, Ziyue Jiang, Haoran Li, Jiarong Kang, Yong Qin
Semantic Enrichment Text Based Speech Editing Speech Editing Phonetic Embeddings

June 12, 2024

CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction
Xueyuan Chen, Dongchao Yang, Dingdong Wang, Xixin Wu, Zhiyong Wu, Helen Meng
Speech Encoder Dysarthric Speech Codec Language Model Phonetic Embeddings Dysarthric Speech Reconstruction

April 4, 2024

Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition
Hainan Xu, Zhehuai Chen, Fei Jia, Boris Ginsburg
Automatic Speech Recognition Speech Recognition Accuracy Transformer Transducer Sequence Transducer Phonetic Embeddings

October 26, 2023

Towards Matching Phones and Speech Representations
Gene-Ping Yang, Hao Tang
Self Supervised Learning Speech Representation Self Supervised Loss Phonetic Embeddings Product Matching

September 13, 2023

Enhancing Child Vocalization Classification with Phonetically-Tuned Embeddings for Assisting Autism Diagnosis
Jialu Li, Mark Hasegawa-Johnson, Karrie Karahalios
Autism Spectrum Disorder Wav2vec U Phonetic Embeddings Long Form Audio

July 23, 2023

SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic Spaces
Ivan Vallés-Pérez, Grzegorz Beringer, Piotr Bilinski, Gary Cook, Roberto Barra-Chicote
Speech Generation CLIP Model Speech Domain Contrastive Audio Phonetic Embeddings Speech Generation Task Ferrous Scrap

June 8, 2023

Revisit Few-shot Intent Classification with PLMs: Direct Fine-tuning vs. Continual Pre-training
Haode Zhang, Haowen Liang, Liming Zhan, Albert Y.S. Lam, Xiao-Ming Wu
Pre Trained Language Model Intent Classification Continual Pre Training Pre Trained Language Intent Classifier Phonetic Embeddings Shot Intent Classification Shot Intent Detection

April 5, 2023

PWESuite: Phonetic Word Embeddings and Tasks They Facilitate
Vilém Zouhar, Kalvin Chang, Chenxuan Cui, Nathaniel Carlson, Nathaniel Robinson, Mrinmaya Sachan, David Mortensen
Word Embeddings Phonetic Information Phonetic Embeddings

October 30, 2022

Improvements to Embedding-Matching Acoustic-to-Word ASR Using Multiple-Hypothesis Pronunciation-Based Embeddings
Hao Yen, Woojay Jeon
Automatic Speech Recognition Large Relevance Improvement Acoustic Word Embeddings Phonetic Embeddings Depth Hypothesis

October 21, 2022

Spoken Term Detection and Relevance Score Estimation using Dot-Product of Pronunciation Embeddings
Jan Švec, Luboš Šmídl, Josef V. Psutka, Aleš Pražák
LSTM Network LSTM Model Relevance Modeling Dot Product Phonetic Embeddings Spoken Term

June 16, 2022

Nonwords Pronunciation Classification in Language Development Tests for Preschool Children
Ilja Baumann, Dominik Wagner, Sebastian Bayerl, Tobias Bocklet
Acoustic Model Working Memory Phonetic Embeddings Language Test

April 1, 2022

Filter-based Discriminative Autoencoders for Children Speech Recognition
Chiang-Lin Tai, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang
Supervised Autoencoder Child Speech Recognition Phonetic Embeddings

Phonetic Embeddings

Papers

DiffEditor: Enhancing Speech Editing with Semantic Enrichment and Acoustic Consistency

CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction

Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition

Towards Matching Phones and Speech Representations

Enhancing Child Vocalization Classification with Phonetically-Tuned Embeddings for Assisting Autism Diagnosis

SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic Spaces

Revisit Few-shot Intent Classification with PLMs: Direct Fine-tuning vs. Continual Pre-training

PWESuite: Phonetic Word Embeddings and Tasks They Facilitate

Improvements to Embedding-Matching Acoustic-to-Word ASR Using Multiple-Hypothesis Pronunciation-Based Embeddings

Spoken Term Detection and Relevance Score Estimation using Dot-Product of Pronunciation Embeddings

Nonwords Pronunciation Classification in Language Development Tests for Preschool Children

Filter-based Discriminative Autoencoders for Children Speech Recognition