Expressive Speech

Expressive speech synthesis aims to generate speech that conveys not only linguistic content but also emotional nuances and stylistic variations, mirroring the richness of human communication. Current research focuses on improving the expressiveness of models, often employing techniques like diffusion models, variational autoencoders, and graph neural networks, and incorporating linguistic features (e.g., emphasis, semantics) to enhance control and naturalness. Advances in this field have significant implications for applications such as virtual assistants, audiobooks, and accessibility technologies, while also providing valuable insights into the computational modeling of human communication.

Papers

May 29, 2024

$E^{3}$Gen: Efficient, Expressive and Editable Avatars Generation
Weitian Zhang, Yichao Yan, Yunhui Liu, Xingdong Sheng, Xiaokang Yang
High Efficiency 3D Gaussian Expressive Speech Avatar Generation

May 27, 2024

Exploring the LLM Journey from Cognition to Expression with Linear Representations
Yuzi Yan, Jialian Li, Yipin Zhang, Dong Yan
LLM Based Cognitive Learning Expressive Speech Expressive Power Cognitive Capability Human Expression Linear Representation

May 20, 2024

Linguistic Structure from a Bottleneck on Sequential Information Processing
Richard Futrell, Michael Hahn
Human Language Expressive Speech Major Challenge Bottleneck Linguistic Structure Sequential Computation Structured Environment Statistical Complexity

May 19, 2024

InterAct: Capture and Modelling of Realistic, Expressive and Interactive Activities between Two Persons in Daily Scenarios
Yinghao Huang, Leo Ho, Dafei Qin, Mingyi Shi, Taku Komura
Expressive Speech Daily Living Motion Sequence Student Engagement Digital Action Interactive Behavior High Quality Motion Self Expressive

April 29, 2024

SpherE: Expressive and Interpretable Knowledge Graph Embedding for Set Retrieval
Zihao Li, Yuyi Ao, Jingrui He
Knowledge Graph Expressive Speech Spherical Surface Many to Many KG Embeddings Self Expressive

April 23, 2024

StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations
Sen Liu, Yiwei Guo, Xie Chen, Kai Yu
Expressive Speech Game Narrative Expressive Text to Speech

April 16, 2024

The Dearth of the Author in AI-Supported Writing
Max Kreminski
AI System Expressive Speech Author Name Creative Task Writing Tool

March 25, 2024

On Policy Reuse: An Expressive Language for Representing and Executing General Policies that Call Other Policies
Blai Bonet, Dominik Drexler, Hector Geffner
Expressive Speech Textual Feature State Transition Finite State Controller Problem Decomposition

March 14, 2024

HeadEvolver: Text to Head Avatars via Expressive and Attribute-Preserving Mesh Deformation
Duotun Wang, Hengyu Meng, Zeyu Cai, Zhijing Shao, Qianxi Liu, Lin Wang, Mingming Fan, Xiaohang Zhan, Zeyu Wang
Text Modality Human Head Expressive Speech Head Avatar Shape Control Mesh Deformation

March 12, 2024

Let Storytelling Tell Vivid Stories: An Expressive and Fluent Multimodal Storyteller
Chuanqi Zang, Jiji Tang, Rongsheng Zhang, Zeng Zhao, Tangjie Lv, Mingtao Pei, Wei Liang
Story Generation Expressive Speech Narrative Text Image Narrative Generation

February 23, 2024

Transformers are Expressive, But Are They Expressive Enough for Regression?
Swaroop Nath, Harshad Khadilkar, Pushpak Bhattacharyya
Transformer Megatron Decepticons Novel Regression Expressive Speech Transformer Encoders Smooth Function Full Transformer

February 18, 2024

Ain't Misbehavin' -- Using LLMs to Generate Expressive Robot Behavior in Conversations with the Tabletop Robot Haru
Zining Wang, Paul Reisert, Eric Nichols, Randy Gomez
Artificial Intelligence Medical LLM Social Robot Community Conversation Robot Behavior Expressive Speech Expressive Robot

January 1, 2024

Towards Harmonization of SO(3)-Equivariance and Expressiveness: a Hybrid Deep Learning Framework for Electronic-Structure Hamiltonian Prediction
Shi Yin, Xinyang Pan, Xudong Zhu, Tianyu Gao, Haochong Zhang, Feng Wu, Lixin He
Expressive Speech Hybrid Deep Learning Electronic Structure Hamiltonian Learning Inter Vendor Harmonization 3D Equivariance

December 23, 2023

Regularized PolyKervNets: Optimizing Expressiveness and Efficiency for Private Inference in Deep Neural Networks
Toluwani Aremu
Deep Neural Network High Efficiency Expressive Speech Private Inference Polynomial Activation Privacy Preserving Computation

December 15, 2023

DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models
Yifeng Ma, Shiwei Zhang, Jiayu Wang, Xiang Wang, Yingya Zhang, Zhidong Deng
Diffusion Probabilistic Model Style Representation Expressive Speech Head Generation Talking Face

November 2, 2023

Expanding Expressiveness of Diffusion Models with Limited Data via Self-Distillation based Fine-Tuning
Jiwan Hur, Jaehyun Choi, Gyojin Han, Dong-Jae Lee, Junmo Kim
Diffusion Model Fine Tuning Self Distillation Limited Data Expressive Speech Domain Translation

October 26, 2023

Boosting Multi-Speaker Expressive Speech Synthesis with Semi-supervised Contrastive Learning
Xinfa Zhu, Yuke Li, Yi Lei, Ning Jiang, Guoqing Zhao, Lei Xie
Semi Supervised Expressive Speech Speaker Representation Multi Speaker Text to Speech Expressive Speech Synthesis Emotion Transfer

September 22, 2023

Expressive variational quantum circuits provide inherent privacy in federated learning
Niraj Kumar, Jamie Heredge, Changhao Li, Shaltiel Eloul, Shree Hari Sureshbabu, Marco Pistoia
Quantum Circuit Variational Quantum Circuit Expressive Speech Quantum Machine Learning Model Gradient Inversion Attack Inherent Privacy

September 3, 2023

MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling
Zhichao Wang, Xinsheng Wang, Qicong Xie, Tao Li, Lei Xie, Qiao Tian, Yuping Wang
Style Transfer Voice Conversion Expressive Speech Source Speech Multiscale Modeling Non Parallel Speaker Timbre

August 31, 2023

Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information
Shaohuan Zhou, Shun Lei, Weiya You, Deyi Tuo, Yuren You, Zhiyong Wu, Shiyin Kang, Helen Meng
Expressive Speech Singing Voice Bidirectional Encoder Representation From Transformer Singing Voice Synthesis Synthetic Voice End to End Singing Voice

Expressive Speech

Papers

$E^{3}$Gen: Efficient, Expressive and Editable Avatars Generation

Exploring the LLM Journey from Cognition to Expression with Linear Representations

Linguistic Structure from a Bottleneck on Sequential Information Processing

InterAct: Capture and Modelling of Realistic, Expressive and Interactive Activities between Two Persons in Daily Scenarios

SpherE: Expressive and Interpretable Knowledge Graph Embedding for Set Retrieval

StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations

The Dearth of the Author in AI-Supported Writing

On Policy Reuse: An Expressive Language for Representing and Executing General Policies that Call Other Policies

HeadEvolver: Text to Head Avatars via Expressive and Attribute-Preserving Mesh Deformation

Let Storytelling Tell Vivid Stories: An Expressive and Fluent Multimodal Storyteller

Transformers are Expressive, But Are They Expressive Enough for Regression?

Ain't Misbehavin' -- Using LLMs to Generate Expressive Robot Behavior in Conversations with the Tabletop Robot Haru

Towards Harmonization of SO(3)-Equivariance and Expressiveness: a Hybrid Deep Learning Framework for Electronic-Structure Hamiltonian Prediction

Regularized PolyKervNets: Optimizing Expressiveness and Efficiency for Private Inference in Deep Neural Networks

DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

Expanding Expressiveness of Diffusion Models with Limited Data via Self-Distillation based Fine-Tuning

Boosting Multi-Speaker Expressive Speech Synthesis with Semi-supervised Contrastive Learning

Expressive variational quantum circuits provide inherent privacy in federated learning

MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling

Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information