Natural Sounding Speech

Natural-sounding speech synthesis aims to generate human-like speech from text, focusing on improving quality, diversity, and robustness across languages and speaking styles. Current research emphasizes advancements in model architectures like diffusion models, variational autoencoders, and transformer networks, often incorporating techniques such as disentangled representations and adversarial training to enhance naturalness and control over prosody and emotion. This field is crucial for applications ranging from assistive technologies and personalized voice assistants to combating synthetic misinformation, driving ongoing efforts to develop more accurate and efficient speech synthesis systems and robust detection methods.

Papers

April 8, 2022

Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-Speech
Jae-Sung Bae, Jinhyeok Yang, Tae-Jun Bak, Young-Sun Joo
Synthesized Speech Prosodic Feature Diverse Set Text to Speech Model Natural Sounding Speech Non Autoregressive Text to Speech

April 6, 2022

Successes and critical failures of neural networks in capturing human-like speech recognition
Federico Adolfi, Jeffrey S. Bowers, David Poeppel
Neural Network Speech Recognition Cognitive Science Natural Sounding Speech Financial Success

March 1, 2022

TRILLsson: Distilled Universal Paralinguistic Speech Representations
Joel Shor, Subhashini Venugopalan
Knowledge Distillation Emotion Recognition Speech Representation Natural Sounding Speech Computational Paralinguistics Universal Speech Representation

January 24, 2022

Polyphone disambiguation and accent prediction using pre-trained language models in Japanese TTS front-end
Rem Hida, Masaki Hamada, Chie Kamada, Emiru Tsunoo, Toshiyuki Sekiya, Toshiyuki Kumakura
Pre Trained Language Model Text to Speech Synthesized Speech Natural Sounding Speech Accent Recognition Polyphone Disambiguation

January 10, 2022

Polish Natural Language Inference and Factivity -- an Expert-based Dataset and Benchmarks
Daniel Ziembicki, Anna Wróblewska, Karolina Seweryn
Language Model Natural Language Processing New Benchmark Natural Language Inference Factual Claim Natural Sounding Speech Expert Annotated

November 30, 2021

Generating Rich Product Descriptions for Conversational E-commerce Systems
Shashank Kedia, Aditya Mantha, Sneha Gupta, Stephen Guo, Kannan Achan
Natural Sounding Speech Conversational System BERT Embeddings Speech Technology Product Description Generation

November 19, 2021

SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation on Natural Speech
Suwon Shon, Ankita Pasad, Felix Wu, Pablo Brusco, Yoav Artzi, Karen Livescu, Kyu J. Han
Automatic Speech Recognition Spoken Language Understanding Speech Processing Natural Sounding Speech Benchmark Task

Natural Sounding Speech

Papers

Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-Speech

Successes and critical failures of neural networks in capturing human-like speech recognition

TRILLsson: Distilled Universal Paralinguistic Speech Representations

Polyphone disambiguation and accent prediction using pre-trained language models in Japanese TTS front-end

Polish Natural Language Inference and Factivity -- an Expert-based Dataset and Benchmarks

Generating Rich Product Descriptions for Conversational E-commerce Systems

SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation on Natural Speech