Prosodic Feature
Prosodic features, encompassing aspects of speech like pitch, intensity, and rhythm, are crucial for conveying meaning and emotion beyond the literal words spoken. Current research focuses on accurately modeling and manipulating these features in applications such as speech synthesis, editing, and voice conversion, often employing deep learning models like diffusion models, variational autoencoders, and actor-critic reinforcement learning. This work is significant for improving the naturalness and expressiveness of synthetic speech, enhancing accessibility for individuals with communication disorders, and advancing our understanding of human communication itself.
Papers
A Discourse-level Multi-scale Prosodic Model for Fine-grained Emotion Analysis
Xianhao Wei, Jia Jia, Xiang Li, Zhiyong Wu, Ziyi Wang
FluentEditor: Text-based Speech Editing by Considering Acoustic and Prosody Consistency
Rui Liu, Jiatian Xi, Ziyue Jiang, Haizhou Li
Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech
Rui Liu, Bin Liu, Haizhou Li