Independent Phone to Audio Alignment
Independent phone-to-audio alignment focuses on accurately mapping phonetic units to their corresponding segments in audio recordings, without relying on pre-aligned text transcriptions. Current research emphasizes leveraging self-supervised learning, diffusion models, and large language models to improve alignment accuracy, often incorporating techniques like cross-attention mechanisms and dynamic programming for optimal sequence partitioning. These advancements are crucial for enhancing various speech processing applications, including speech synthesis, keyword spotting, and multimodal sentiment analysis, particularly in scenarios with noisy or low-resource data.
Papers
May 3, 2024
January 2, 2024
November 1, 2023
October 10, 2023
June 20, 2023
June 8, 2023
November 12, 2022
June 28, 2022