Phoneme Alignment
Phoneme alignment, the process of matching phonetic segments in speech to their corresponding textual representations, is crucial for various speech processing tasks. Current research focuses on improving alignment accuracy using diverse approaches, including variational autoencoders (VAEs), transformer networks, and self-supervised learning (SSL) methods, often incorporating acoustic and linguistic features to enhance model performance. These advancements are driving progress in applications such as speech synthesis, cross-lingual transfer learning, and historical linguistics, where accurate phoneme alignment is essential for analyzing sound correspondences and reconstructing ancestral languages. The development of faster and more accurate alignment tools also benefits phonetic research by reducing the time and effort required for manual annotation of speech data.
Papers
Speaker- and Text-Independent Estimation of Articulatory Movements and Phoneme Alignments from Speech
Tobias Weise, Philipp Klumpp, Kubilay Can Demir, Paula Andrea Pérez-Toro, Maria Schuster, Elmar Noeth, Bjoern Heismann, Andreas Maier, Seung Hee Yang
VAE-based Phoneme Alignment Using Gradient Annealing and SSL Acoustic Features
Tomoki Koriyama