Text to Speech

Text-to-speech (TTS) research aims to synthesize natural-sounding human speech from textual input, focusing on improving speech quality, speaker similarity, and efficiency. Current efforts concentrate on developing advanced architectures like diffusion models and transformers, often incorporating techniques such as flow matching and semantic communication to enhance both the naturalness and expressiveness of generated speech. This field is crucial for applications ranging from assistive technologies and accessibility tools to combating deepfakes and creating more realistic synthetic datasets for training other AI models.

Papers

June 1, 2023

The Effects of Input Type and Pronunciation Dictionary Usage in Transfer Learning for Low-Resource Text-to-Speech
Phat Do, Matt Coler, Jelske Dijkstra, Esther Klabbers
Automatic Speech Recognition Transfer Learning Mixed Effect Low Resource Language Text to Speech Multilingual Model Cross Lingual Transfer Low Resource Text to Speech Pronunciation Dictionary

May 31, 2023

May 30, 2023

May 28, 2023

Stochastic Pitch Prediction Improves the Diversity and Naturalness of Speech in Glow-TTS
Sewade Ogun, Vincent Colotte, Emmanuel Vincent
Speech Analysis Text to Speech Diversity Awareness Text to Speech Model Visual Naturalness Stochastic Pitch Prediction

May 24, 2023

LAraBench: Benchmarking Arabic AI with Large Language Models
Ahmed Abdelali, Hamdy Mubarak, Shammur Absar Chowdhury, Maram Hasanain, Basel Mousi, Sabri Boughorbel, Yassine El Kheir, Daniel Izham, Fahim Dalvi, Majd Hawasly, Nizi Nazar, Yousseif Elshahawy, Ahmed Ali, Nadir Durrani, Natasa Milic-Frayling, Firoj Alam
Text to Speech Arabic Natural Language Processing Speech Processing Task

May 23, 2023

EfficientSpeech: An On-Device Text to Speech Model
Rowel Atienza
Text to Speech Speech Model Pyramid Transformer Device Use Case Neural Text to Speech

May 22, 2023

May 20, 2023

EE-TTS: Emphatic Expressive TTS with Linguistic Information
Yi Zhong, Chen Zhang, Xule Liu, Chenxi Sun, Weishan Deng, Haifeng Hu, Zhongqian Sun
Text to Speech Expressive Speech Linguistic Information High Quality Speech Emphasis Detection

May 19, 2023

MParrotTTS: Multilingual Multi-speaker Text to Speech Synthesis in Low Resource Setting
Neil Shah, Vishal Tambrahalli, Saiteja Kosgi, Niranjan Pedanekar, Vineet Gandhi
Text to Speech Speech Synthesis Low Resource Self Supervised Speech Representation High Quality Speech Multi Speaker Text to Speech

May 18, 2023

April 23, 2023

DiffVoice: Text-to-Speech with Latent Diffusion
Zhijun Liu, Yiwei Guo, Kai Yu
Variational Autoencoder Text to Speech Latent Diffusion Text Based Speech Editing Latent Speech

April 18, 2023

NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
Kai Shen, Zeqian Ju, Xu Tan, Yanqing Liu, Yichong Leng, Lei He, Tao Qin, Sheng Zhao, Jiang Bian
Zero Shot Speech Analysis Latent Diffusion Model Text to Speech Singing Voice Speech Naturalness