Stochastic Pitch Prediction

Stochastic pitch prediction focuses on accurately and probabilistically modeling the fundamental frequency of sound, particularly in speech and music synthesis. Current research emphasizes developing robust end-to-end models, often employing variational autoencoders or neural networks (like convolutional networks), to improve the naturalness and diversity of generated audio by explicitly modeling pitch variability. This work is significant for advancing speech and music synthesis technologies, enabling more realistic and expressive audio generation across various applications, including text-to-speech systems and singing voice synthesis.

Papers