Text to Speech Model

Text-to-speech (TTS) models aim to synthesize natural-sounding human speech from text input, focusing on improving both the quality and controllability of generated audio. Current research emphasizes enhancing model architectures like Transformers and diffusion models, incorporating techniques such as preference alignment, adversarial training, and hierarchical acoustic modeling to achieve higher fidelity, speaker consistency, and emotional expressiveness. These advancements are significant for applications ranging from accessibility tools for the visually impaired to personalized voice assistants and improved synthetic data generation for other AI tasks.

Papers

February 27, 2023

February 7, 2023

Characterizing Financial Market Coverage using Artificial Intelligence
Jean Marie Tshimula, D'Jeff K. Nkashama, Patrick Owusu, Marc Frappier, Pierre-Martin Tardif, Froduald Kabanza, Armelle Brun, Jean-Marc Patenaude, Shengrui Wang, Belkacem Chikhaoui
Artificial Intelligence Natural Language Processing Text to Speech Model Financial Market Financial News

November 28, 2022

Handling and extracting key entities from customer conversations using Speech recognition and Named Entity recognition
Sharvi Endait, Ruturaj Ghatage, Prof. DD Kadam
Speech Recognition Entity Recognition Named Entity Recognition Entity Mention Customer Service Text to Speech Model Business Call

November 23, 2022

IMaSC -- ICFOSS Malayalam Speech Corpus
Deepa P Gopinath, Thennal D K, Vrinda V Nair, Swaraj K S, Sachin G
Text to Speech Synthesized Speech Speech Corpus Text to Speech Model

November 17, 2022

Towards Building Text-To-Speech Systems for the Next Billion Users
Gokul Karthik Kumar, Praveen S, Pratyush Kumar, Mitesh M. Khapra, Karthik Nandakumar
Text to Speech Acoustic Model Text to Speech Model

October 26, 2022

Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection
Kentaro Seki, Shinnosuke Takamichi, Takaaki Saeki, Hiroshi Saruwatari
Text to Speech Speech Data Synthesized Speech Data Selection Speech Corpus Text to Speech Model Text to Speech Synthesis

October 7, 2022

SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training
Ziqiang Zhang, Long Zhou, Junyi Ao, Shujie Liu, Lirong Dai, Jinyu Li, Furu Wei
Text Modality Cross Modal Bridging Text Text to Speech Model Hidden Knowledge Speech to Unit Speech Text

September 22, 2022

June 29, 2022

Finstreder: Simple and fast Spoken Language Understanding with Finite State Transducers using modern Speech-to-Text models
Daniel Bermuth, Alexander Poeppel, Wolfgang Reif
Spoken Language Understanding Speech Processing Text to Speech Model Finite State Transducer Voice Command

June 27, 2022

Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding
Wei-Ping Huang, Po-Chun Chen, Sung-Feng Huang, Hung-yi Lee
Transfer Learning Text to Speech Speech to Text Text to Speech Model Shot Training Cross Lingual Text to Speech

June 9, 2022

Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos
Alexander Waibel, Moritz Behr, Fevziye Irem Eyiokur, Dogucan Yaman, Tuan-Nam Nguyen, Carlos Mullov, Mehmet Arif Demirtas, Alperen Kantarcı, Stefan Constantin, Hazım Kemal Ekenel
End to End Gameplay Video Voice Conversion Text to Speech Model Synthetic Voice Lip Synchronization

May 15, 2022

GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech
Rongjie Huang, Yi Ren, Jinglin Liu, Chenye Cui, Zhou Zhao
Style Transfer Speech Synthesis Text to Speech Model

April 22, 2022

LibriS2S: A German-English Speech-to-Speech Translation Corpus
Pedro Jeuris, Jan Niehues
Speech Translation Speech Corpus Text to Speech Model Speech to Speech Translation Speech Translation Corpus

April 8, 2022

Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-Speech
Jae-Sung Bae, Jinhyeok Yang, Tae-Jun Bak, Young-Sun Joo
Synthesized Speech Prosodic Feature Diverse Set Text to Speech Model Natural Sounding Speech Non Autoregressive Text to Speech

March 29, 2022

Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus
Minchan Kim, Myeonghun Jeong, Byoung Jin Choi, Sunghwan Ahn, Joun Yeop Lee, Nam Soo Kim
Text to Speech Text to Speech Model Unlabeled Speech Multi Speaker Text to Speech Single Speaker Transfer Learning Framework Low Resource Text to Speech

December 7, 2021

Training end-to-end speech-to-text models on mobile phones
Zitha S, Raghavendra Rao Suresh, Pooja Rao, T. V. Prabhakar
Training Data Text to Speech Model Device Training Personalization Method Mobile Phone

Text to Speech Model

Papers

Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech

Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech

Characterizing Financial Market Coverage using Artificial Intelligence

Handling and extracting key entities from customer conversations using Speech recognition and Named Entity recognition

IMaSC -- ICFOSS Malayalam Speech Corpus

Towards Building Text-To-Speech Systems for the Next Billion Users

Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection

SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training

EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models

MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline

Finstreder: Simple and fast Spoken Language Understanding with Finite State Transducers using modern Speech-to-Text models

Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding

Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos

GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech

LibriS2S: A German-English Speech-to-Speech Translation Corpus

Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-Speech

Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus

Training end-to-end speech-to-text models on mobile phones