Source Speech

Source speech analysis focuses on extracting meaningful information from spoken language, encompassing tasks like transcription correction, speaker identification, emotion recognition, and topic segmentation. Current research heavily utilizes large language models (LLMs) and transformer-based architectures, often incorporating techniques like self-supervised learning, multi-task learning, and multilingual training to improve performance and robustness across diverse languages and speaking styles. These advancements are driving progress in various applications, including improved speech-to-speech translation, real-time voice conversion, and enhanced accessibility for low-resource languages.

Papers

September 15, 2023

Improving Voice Conversion for Dissimilar Speakers Using Perceptual Losses
Suhita Ghosh, Yamini Sinha, Ingo Siegert, Sebastian Stober
Speaker Verification Voice Conversion Speech Data Source Speech Perceptual Loss Speaker Change

September 6, 2023

Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data
Hyungseob Lim, Kyungguen Byun, Sunkuk Moon, Erik Visser
Voice Conversion Speaker Similarity Source Speech

September 3, 2023

MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling
Zhichao Wang, Xinsheng Wang, Qicong Xie, Tao Li, Lei Xie, Qiao Tian, Yuping Wang
Style Transfer Voice Conversion Expressive Speech Source Speech Multiscale Modeling Non Parallel Speaker Timbre

August 28, 2023

FonMTL: Towards Multitask Learning for the Fon Language
Bonaventure F. P. Dossou, Iffanice Houndayi, Pamely Zantou, Gilles Hacheme
Language Model Entity Recognition Multitask Learning Source Speech

June 21, 2023

Automatic Speech Disentanglement for Voice Conversion using Rank Module and Speech Augmentation
Zhonghua Liu, Shijun Wang, Ning Chen
Voice Conversion Disentanglement Capability Speech Representation Disentanglement Source Speech Human Speech Language Augmentation Speech Naturalness

May 30, 2023

Voice Conversion With Just Nearest Neighbors
Matthew Baas, Benjamin van Niekerk, Herman Kamper
Voice Conversion High Fidelity Vocoder Speaker Similarity Source Speech

March 20, 2023

Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture and Single-Source Speech
Maryam Fazel-Zarandi, Wei-Ning Hsu
Self Supervised Mixture Component Source Separation Source Speech Multi Talker Self Supervised Framework Speech Mixture

December 14, 2022

AsPOS: Assamese Part of Speech Tagger using Deep Learning Approach
Dhrubajyoti Pathak, Sukumar Nandi, Priyankoo Sarmah
Natural Language Processing Deep Learning Approach Part of Speech Source Speech

December 5, 2022

GNN-SL: Sequence Labeling Based on Nearest Examples via GNN
Shuhe Wang, Yuxian Meng, Rongbin Ouyang, Jiwei Li, Tianwei Zhang, Lingjuan Lyu, Guoyin Wang
Entity Recognition Gene Level GNN Sequence Labeling Source Speech Sequence Labeling Task

December 4, 2022

Improving End-to-end Speech Translation by Leveraging Auxiliary Speech and Text Data
Yuhao Zhang, Chen Xu, Bojie Hu, Chunliang Zhang, Tong Xiao, Jingbo Zhu
Speech Translation Text Encoder Text Data End to End Speech Translation Source Speech Early Slavic Participle Speech Translation Model Robust Encoders

November 16, 2022

Delivering Speaking Style in Low-resource Voice Conversion with Multi-factor Constraints
Zhichao Wang, Xinsheng Wang, Lei Xie, Yuanzhe Chen, Qiao Tian, Yuping Wang
Voice Conversion Expressive Speech Source Speech Speaking Style Speaker Timbre

October 18, 2022

Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation
Chen Wang, Yuchen Liu, Boxing Chen, Jiajun Zhang, Wei Luo, Zhongqiang Huang, Chengqing Zong
Machine Translation Speech Translation Cross Modal Alignment Source Speech Zero Shot Speech

May 26, 2022

Grammar Detection for Sentiment Analysis through Improved Viterbi Algorithm
Surya Teja Chavali, Charan Tej Kandavalli, Sugash T M
Natural Language Processing Entity Recognition Sentiment Analysis Source Speech Viterbi Algorithm

April 19, 2022

Time Domain Adversarial Voice Conversion for ADD 2022
Cheng Wen, Tingwei Guo, Xingjun Tan, Rui Yan, Shuran Zhou, Chuandong Xie, Wei Zou, Xiangang Li
Fake Speech Source Speech

March 16, 2022

Sample, Translate, Recombine: Leveraging Audio Alignments for Data Augmentation in End-to-end Speech Translation
Tsz Kin Lam, Shigehiko Schamoni, Stefan Riezler
Data Augmentation Speech Translation Language Pair End to End Speech Translation Source Speech Synthetic Data Augmentation Model Recombination Audio Alignment

March 15, 2022

Text-free non-parallel many-to-many voice conversion using normalising flows
Thomas Merritt, Abdelhamid Ezzerg, Piotr Biliński, Magdalena Proszewska, Kamil Pokora, Roberto Barra-Chicote, Daniel Korzekwa
Normalizing Flow Voice Conversion Source Speech Non Parallel

March 7, 2022

Enhance Language Identification using Dual-mode Model with Knowledge Distillation
Hexin Liu, Leibny Paola Garcia Perera, Andy W. H. Khong, Justin Dauwels, Suzy J. Styles, Sanjeev Khudanpur
Knowledge Distillation Language Identification Source Speech Self Attention Network Speaker Variability Dual Mode

January 30, 2022

Part of Speech Tagging (POST) of a Low-resource Language using another Language (Developing a POS-Tagged Lexicon for Kurdish (Sorani) using a Tagged Persian (Farsi) Corpus)
Hossein Hassani
Large Corpus Low Resource Language French Dictionary Source Speech Po Tagger

Source Speech

Papers

Improving Voice Conversion for Dissimilar Speakers Using Perceptual Losses

Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data

MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling

FonMTL: Towards Multitask Learning for the Fon Language

Automatic Speech Disentanglement for Voice Conversion using Rank Module and Speech Augmentation

Voice Conversion With Just Nearest Neighbors

Cocktail HuBERT: Generalized Self-Supervised Pre-training for Mixture and Single-Source Speech

AsPOS: Assamese Part of Speech Tagger using Deep Learning Approach

GNN-SL: Sequence Labeling Based on Nearest Examples via GNN

Improving End-to-end Speech Translation by Leveraging Auxiliary Speech and Text Data

Delivering Speaking Style in Low-resource Voice Conversion with Multi-factor Constraints

Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation

Grammar Detection for Sentiment Analysis through Improved Viterbi Algorithm

Time Domain Adversarial Voice Conversion for ADD 2022

Sample, Translate, Recombine: Leveraging Audio Alignments for Data Augmentation in End-to-end Speech Translation

Text-free non-parallel many-to-many voice conversion using normalising flows

Enhance Language Identification using Dual-mode Model with Knowledge Distillation

Part of Speech Tagging (POST) of a Low-resource Language using another Language (Developing a POS-Tagged Lexicon for Kurdish (Sorani) using a Tagged Persian (Farsi) Corpus)