Speech Mapping

Speech mapping focuses on developing computational models that accurately translate between speech and other modalities, such as text or lip movements. Current research emphasizes improving the accuracy and efficiency of these mappings using various deep learning architectures, including transformers, variational autoencoders, and Siamese networks, often incorporating techniques like chain-of-thought prompting and pseudo-labeling to address data limitations. These advancements are driving progress in applications like voice-controlled devices, speech-to-speech translation, and improved automatic speech recognition, particularly in challenging acoustic environments. The ultimate goal is to create more natural and robust human-computer interaction systems.

Papers

July 10, 2024

Evaluating Voice Command Pipelines for Drone Control: From STT and LLM to Direct Classification and Siamese Networks
Lucca Emmanuel Pineli Simões, Lucas Brandão Rodrigues, Rafaela Mota Silva, Gustavo Rodrigues da Silva
Classification Code Speech Recognition Siamese Network Voice Based Drone Control Processing Pipeline Tello Drone Speech Mapping

May 30, 2024

SeamlessExpressiveLM: Speech Language Model for Expressive Speech-to-Speech Translation with Chain-of-Thought
Hongyu Gong, Bandhav Veluri
Speech Language Model Direct S2ST Speech Mapping

October 12, 2023

Toward Joint Language Modeling for Speech Units and Text
Ju-Chieh Chou, Chung-Ming Chien, Wei-Ning Hsu, Karen Livescu, Arun Babu, Alexis Conneau, Alexei Baevski, Michael Auli
Text Modality Downstream NLP Task Speech Mapping Speech Text Data

October 20, 2022

Improving Semi-supervised End-to-end Automatic Speech Recognition using CycleGAN and Inter-domain Losses
Chia-Yu Li, Ngoc Thang Vu
Automatic Speech Recognition Semi Supervised CycleGAN Model Domain Loss Unpaired Speech Speech Mapping

June 28, 2022

Show Me Your Face, And I'll Tell You How You Speak
Christen Millerdurai, Lotfy Abdel Khaliq, Timon Ulrich
Speech Synthesis Human Face Speaker Identity Voice Identity Speech Mapping

May 18, 2022

Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation
Qianqian Dong, Fengpeng Yue, Tom Ko, Mingxuan Wang, Qibing Bai, Yu Zhang
Language Pair Pseudo Labeled Data Direct Speech to Speech Translation Transformer Baseline Speech Mapping

April 24, 2022

Improved far-field speech recognition using Joint Variational Autoencoder
Shashi Kumar, Shakti P. Rath, Abhishek Pandey
Automatic Speech Recognition Supervised Autoencoder Speech Enhancement Variational Autoencoders Speech Mapping

March 29, 2022

Decomposed Temporal Dynamic CNN: Efficient Time-Adaptive Network for Text-Independent Speaker Verification Explained with Speaker Activation Map
Seong-Hu Kim, Hyeonuk Nam, Yong-Hwa Park
Speaker Verification Temporal Convolutional Network Adaptive Network Text Independent Speaker Verification Speech Mapping