Acoustic Word Embeddings
Acoustic word embeddings (AWEs) are fixed-length vector representations of spoken words, aiming to capture both phonetic and semantic information for improved speech processing. Current research focuses on enhancing AWE models using techniques like self-supervised learning (e.g., HuBERT, Wav2vec 2.0), multi-view learning (combining acoustic and textual data), and various deep metric learning loss functions (e.g., proxy losses). These advancements are improving performance in diverse applications, including keyword spotting, speech emotion recognition, and low-resource language processing, by enabling more accurate and efficient analysis of spoken language.
Papers
September 14, 2022
July 11, 2022
June 10, 2022
April 4, 2022
March 30, 2022