Natural Sounding Speech

Natural-sounding speech synthesis aims to generate human-like speech from text, focusing on improving quality, diversity, and robustness across languages and speaking styles. Current research emphasizes advancements in model architectures like diffusion models, variational autoencoders, and transformer networks, often incorporating techniques such as disentangled representations and adversarial training to enhance naturalness and control over prosody and emotion. This field is crucial for applications ranging from assistive technologies and personalized voice assistants to combating synthetic misinformation, driving ongoing efforts to develop more accurate and efficient speech synthesis systems and robust detection methods.

Papers

June 5, 2023

LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading
Yochai Yemini, Aviv Shamsian, Lior Bracha, Sharon Gannot, Ethan Fetaya
Speech Generation Natural Sounding Speech High Quality Speech Lip Reading Silent Video Lip to Speech

June 2, 2023

In-the-wild Speech Emotion Conversion Using Disentangled Self-Supervised Representations and Neural Vocoder-based Resynthesis
Navin Raj Prabhu, Nale Lehmann-Willenbrock, Timo Gerkmann
Natural Sounding Speech Speech Resynthesis Emotion Conversion

April 21, 2023

Generative AI Perceptions: A Survey to Measure the Perceptions of Faculty, Staff, and Students on Generative AI Tools in Academia
Sara Amani, Lance White, Trini Balart, Laksha Arora, Dr. Kristi J. Shryock, Dr. Kelly Brumbelow, Dr. Karan L. Watson
Timely Survey Generative AI ChatGPT Generated Conversation Natural Sounding Speech Generative Artificial Intelligence Tool Natural Language Processing Tool Engineering Education

March 14, 2023

Detecting post-stroke aphasia using EEG-based neural envelope tracking of natural speech
Pieter De Clercq, Jill Kries, Ramtin Mehraram, Jonas Vanthornhout, Tom Francart, Maaike Vandermosten
Natural Sounding Speech Envelope Tracking Speech Envelope

January 28, 2023

Underwater Robotics Semantic Parser Assistant
Parth Parekh, Cedric McGuire, Jake Imyak
Natural Language Semantic Parsing Sequence Modeling Underwater Robot Natural Sounding Speech Lambda Calculus

January 22, 2023

Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study
Massa Baali, Tomoki Hayashi, Hamdy Mubarak, Soumi Maiti, Shinji Watanabe, Wassim El-Hajj, Ahmed Ali
Automatic Speech Recognition Case Study Natural Sounding Speech Tt Model News Video Unsupervised Data Selection

December 7, 2022

Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and Transfer Learning
Ankur Debnath, Shridevi S Patil, Gangotri Nadiger, Ramakrishnan Angarai Ganesan
Transfer Learning End to End Low Resource Speech Data Speech Quality Natural Sounding Speech

November 29, 2022

Evaluating and reducing the distance between synthetic and real speech distributions
Christoph Minixhofer, Ondřej Klejch, Peter Bell
Text to Speech Speech Data Synthesized Speech Distance Matter Natural Sounding Speech Utterance Level Phoneme Duration

November 26, 2022

Contextual Expressive Text-to-Speech
Jianhong Tu, Zeyu Cui, Xiaohuan Zhou, Siqi Zheng, Kai Hu, Ju Fan, Chang Zhou
Expressive Speech Natural Sounding Speech Expressive Text to Speech

November 11, 2022

Semi-supervised learning for continuous emotional intensity controllable speech synthesis with disentangled representations
Yoori Oh, Juheon Lee, Yoseob Han, Kyogu Lee
Semi Supervised Speech Synthesis Meaningful Representation Natural Sounding Speech Controllable Speech Synthesis Continuous Emotion Emotion Space Phoneme Sequence

October 19, 2022

N-Best Hypotheses Reranking for Text-To-SQL Systems
Lu Zeng, Sree Hari Krishnan Parthasarathi, Dilek Hakkani-Tur
Pre Trained Language Model Text to SQL Natural Sounding Speech Automatic Speech Recognition Hypothesis

October 10, 2022

Self-move and Other-move: Quantum Categorical Foundations of Japanese
Ryder Dale Walton
Natural Language Processing Natural Sounding Speech Category Theory Quantum Natural Language Processing Japanese Text MOVE Brilliance

August 26, 2022

Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer
Shrutina Agarwal, Sriram Ganapathy, Naoya Takahashi
Speech Analysis Singing Voice Natural Sounding Speech Voice Style Transfer Symmetric Neural Network Phoneme Alignment

August 23, 2022

GenTUS: Simulating User Behaviour and Language in Task-oriented Dialogues with Generative Transformers
Hsien-Chin Lin, Christian Geishauser, Shutong Feng, Nurul Lubis, Carel van Niekerk, Michael Heck, Milica Gašić
Language Generation Human Language Task Oriented Dialogue System Task Oriented Dialogue Natural Sounding Speech User Simulator Generative Transformer User Utterance User Behavior Simulation

July 1, 2022

Building African Voices
Perez Ogayo, Graham Neubig, Alan W Black
Text to Speech Speech Synthesis African Language Natural Sounding Speech

June 15, 2022

Disentangling visual and written concepts in CLIP
Joanna Materzynska, Antonio Torralba, David Bau
Single CLIP Natural Image Concept Identification Natural Sounding Speech Image Encoder Visual Processing

June 1, 2022

Natural Language Sentence Generation from API Specifications
Siyu Huo, Kushal Mukherjee, Jayachandu Bandlamudi, Vatche Isahagian, Vinod Muthusamy, Yara Rizk
Chatbot Response Intent Detection Human in the Loop Natural Sounding Speech Sentence Generation Application Programming Interface Documentation

May 25, 2022

May 17, 2022

Composing General Audio Representation by Fusing Multilayer Features of a Pre-trained Model
Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino
Pre Trained Model Audio Representation DNN Model Natural Sounding Speech Intermediate Layer Multi Level Feature Layer Output

Natural Sounding Speech

Papers

LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading

In-the-wild Speech Emotion Conversion Using Disentangled Self-Supervised Representations and Neural Vocoder-based Resynthesis

Generative AI Perceptions: A Survey to Measure the Perceptions of Faculty, Staff, and Students on Generative AI Tools in Academia

Detecting post-stroke aphasia using EEG-based neural envelope tracking of natural speech

Underwater Robotics Semantic Parser Assistant

Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study

Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and Transfer Learning

Evaluating and reducing the distance between synthetic and real speech distributions

Contextual Expressive Text-to-Speech

Semi-supervised learning for continuous emotional intensity controllable speech synthesis with disentangled representations

N-Best Hypotheses Reranking for Text-To-SQL Systems

Self-move and Other-move: Quantum Categorical Foundations of Japanese

Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer

GenTUS: Simulating User Behaviour and Language in Task-oriented Dialogues with Generative Transformers

Building African Voices

Disentangling visual and written concepts in CLIP

Natural Language Sentence Generation from API Specifications

Understanding Natural Language in Context

Non-Programmers Can Label Programs Indirectly via Active Examples: A Case Study with Text-to-SQL

Composing General Audio Representation by Fusing Multilayer Features of a Pre-trained Model