Phonetic Embeddings
Phonetic embeddings represent spoken sounds as numerical vectors, aiming to capture their acoustic and linguistic properties for various speech processing tasks. Current research focuses on improving embedding quality through techniques like incorporating semantic information from language models, leveraging multi-modal data (e.g., visual cues), and designing models that explicitly account for phonetic relationships and reduce error propagation. These advancements are significantly impacting speech recognition, speech synthesis, and applications like dysarthric speech reconstruction and autism diagnosis by enabling more accurate and robust systems.
Papers
September 19, 2024
June 12, 2024
April 4, 2024
October 26, 2023
September 13, 2023
July 23, 2023
June 8, 2023
April 5, 2023
October 30, 2022
October 21, 2022
June 16, 2022