Speech Resynthesis

Speech resynthesis focuses on manipulating and regenerating speech audio, aiming to improve quality, modify speaker characteristics, or translate emotional content while preserving linguistic information. Current research emphasizes efficient model architectures like diffusion models and flow-based models, often incorporating self-supervised learning and techniques like parameter-efficient fine-tuning to address issues like catastrophic forgetting and improve training speed. These advancements are driving improvements in applications such as voice conversion, speech enhancement, and multilingual speech processing, impacting fields ranging from media production to accessibility technologies.

Papers