Expressive Speech

Expressive speech synthesis aims to generate speech that conveys not only linguistic content but also emotional nuances and stylistic variations, mirroring the richness of human communication. Current research focuses on improving the expressiveness of models, often employing techniques like diffusion models, variational autoencoders, and graph neural networks, and incorporating linguistic features (e.g., emphasis, semantics) to enhance control and naturalness. Advances in this field have significant implications for applications such as virtual assistants, audiobooks, and accessibility technologies, while also providing valuable insights into the computational modeling of human communication.

Papers

August 25, 2023

Expressive paragraph text-to-speech synthesis with multi-step variational autoencoder
Xuyuan Li, Zengqiang Shang, Peiyang Shi, Hua Hua, Ta Li, Pengyuan Zhang
Critical Synthesis Expressive Speech Expressive Speech Synthesis Paragraph Speech Audiobook Speech Synthesis

August 22, 2023

Expressive probabilistic sampling in recurrent neural networks
Shirui Chen, Linxing Preston Jiang, Rajesh P. N. Rao, Eric Shea-Brown
Recurrent Neural Network Expressive Speech Neural Dynamic Stochastic Sampling Brain Modeling

July 29, 2023

MSStyleTTS: Multi-Scale Style Modeling with Hierarchical Context Information for Expressive Speech Synthesis
Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu, Xixin Wu, Shiyin Kang, Helen Meng
Style Representation Expressive Speech Multi Scale Representation Multiscale Modeling Expressive Speech Synthesis Hierarchical Context

July 3, 2023

ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading
Yujia Xiao, Shaofei Zhang, Xi Wang, Xu Tan, Lei He, Sheng Zhao, Frank K. Soong, Tan Lee
Speech Analysis Speech Generation Expressive Speech Cross Utterance Tt System

June 9, 2023

May 20, 2023

EE-TTS: Emphatic Expressive TTS with Linguistic Information
Yi Zhong, Chen Zhang, Xule Liu, Chenxi Sun, Weishan Deng, Haifeng Hu, Zhongqian Sun
Text to Speech Expressive Speech Linguistic Information High Quality Speech Emphasis Detection

April 28, 2023

Interpreting Vision and Language Generative Models with Semantic Visual Priors
Michele Cafagna, Lina M. Rojas-Barahona, Kees van Deemter, Albert Gatt
Generative Language Model Token Level Explainability Method Visual Explanation Expressive Speech Image to Text Semantic Prior

March 9, 2023

On the Expressiveness and Generalization of Hypergraph Neural Networks
Zhezheng Luo, Jiayuan Mao, Joshua B. Tenenbaum, Leslie Pack Kaelbling
Strong Generalization Expressive Speech Hypergraph Neural Network Graph Reasoning Structural Generalization

January 29, 2023

Time out of Mind: Generating Rate of Speech conditioned on emotion and speaker
Navjot Kaur, Paige Tuttosi
Speech Analysis Text to Speech Speech Synthesis Human Mind Underlying Emotion Expressive Speech Speech Driven Utterance Length

November 26, 2022

Contextual Expressive Text-to-Speech
Jianhong Tu, Zeyu Cui, Xiaohuan Zhou, Siqi Zheng, Kai Hu, Ju Fan, Chang Zhou
Expressive Speech Natural Sounding Speech Expressive Text to Speech

November 16, 2022

Delivering Speaking Style in Low-resource Voice Conversion with Multi-factor Constraints
Zhichao Wang, Xinsheng Wang, Lei Xie, Yuanzhe Chen, Qiao Tian, Yuping Wang
Voice Conversion Expressive Speech Source Speech Speaking Style Speaker Timbre

November 9, 2022

Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion of Bottleneck and Perturbation Features
Ziqian Ning, Qicong Xie, Pengcheng Zhu, Zhichao Wang, Liumeng Xue, Jixun Yao, Lei Xie, Mengxiao Bi
Voice Conversion Expressive Speech Major Challenge Bottleneck Feature Perturbation Audio Encoder Attention Fusion Prosody Encoder

November 2, 2022

Predicting phoneme-level prosody latents using AR and flow-based Prior Networks for expressive speech synthesis
Konstantinos Klapsas, Karolos Nikitaras, Nikolaos Ellinas, June Sig Sung, Inchul Hwang, Spyros Raptis, Aimilios Chalamandaris, Pirros Tsiakoulis
Synthesized Speech Prosodic Feature Expressive Speech Hierarchical Variational Expressive Speech Synthesis Flow Prior

October 12, 2022

Evaluated CMI Bounds for Meta Learning: Tightness and Expressiveness
Fredrik Hellström, Giuseppe Durisi
Generalization Bound Information Theoretic Expressive Speech Conditional Mutual Information

September 7, 2022

ESSYS* Sharing #UC: An Emotion-driven Audiovisual Installation
Sérgio M. Rebelo, Mariana Seiça, Pedro Martins, João Bicker, Penousal Machado
Expressive Speech Emotional Speech Computational Creativity Social Scenario

July 13, 2022

Text-driven Emotional Style Control and Cross-speaker Style Transfer in Neural TTS
Yookyung Shin, Younggun Lee, Suhee Jo, Yeongtae Hwang, Taesu Kim
Style Transfer Expressive Speech Style Encoder Multi Speaker Tt Expressive Text to Speech Neural Tt

June 30, 2022

Personalized Showcases: Generating Multi-Modal Explanations for Recommendations
An Yan, Zhankui He, Jiacheng Li, Tianyang Zhang, Julian McAuley
Movie Recommendation Natural Language Explanation Visual Explanation Expressive Speech Explanation Model Multi Modal Explanation

June 29, 2022

Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody
Peter Makarov, Ammar Abbas, Mateusz Łajszczak, Arnaud Joly, Sri Karlapati, Alexis Moinet, Thomas Drugman, Penny Karanasou
Text to Speech Prosodic Feature Expressive Speech

June 28, 2022

Expressive, Variable, and Controllable Duration Modelling in TTS
Ammar Abbas, Thomas Merritt, Alexis Moinet, Sri Karlapati, Ewa Muszynska, Simon Slangen, Elia Gatti, Thomas Drugman
Speech Synthesis Expressive Speech Multi Speaker Synthetic Voice Composite Variable Construction Duration Modelling Duration Prediction