Phoneme Level Language Model

Phoneme-level language models (PLMs) are statistical models that predict the probability of phoneme sequences in speech, serving as crucial components in various speech processing applications. Current research focuses on improving PLM accuracy through techniques like joint training with other modeling units (e.g., graphemes), leveraging them in sequence discriminative training for neural transducers, and utilizing them in unsupervised speech recognition frameworks such as diffusion GANs. These advancements are driving improvements in automatic speech recognition, text-to-speech synthesis, and the analysis of speech disorders, particularly by enabling more robust and accurate systems for diverse languages and speech conditions.

Papers