Phonetic PosteriorGrams

Phonetic PosteriorGrams (PPGs) are probabilistic representations of speech sounds, offering a speaker-independent way to encode pronunciation information crucial for tasks like voice conversion and speech editing. Current research focuses on improving PPG quality for higher-fidelity speech synthesis, exploring efficient architectures for real-time applications (e.g., low-latency streaming voice conversion), and investigating alternatives to traditional PPGs, such as self-supervised speech representations. This work has significant implications for various applications, including voice assistants, accessibility technologies for individuals with speech impairments, and media production tools.

Papers