Polyphone Disambiguation

Polyphone disambiguation focuses on resolving the ambiguity of words with identical spellings but different pronunciations, a crucial task in text-to-speech systems and other natural language processing applications. Current research emphasizes leveraging deep learning models, particularly variations of BERT and Transformer architectures, often augmented with external knowledge sources like dictionaries or semi-supervised speech embeddings, to improve accuracy. These advancements are driving improvements in the accuracy and naturalness of synthesized speech across multiple languages, and contribute to broader progress in areas like speech recognition and language understanding.

Papers