Audio Embeddings
Audio embeddings are numerical representations of sound, aiming to capture both acoustic and semantic information for various applications like sound classification and retrieval. Current research focuses on developing robust and efficient embedding models, often leveraging deep neural networks such as transformers and convolutional neural networks, and exploring techniques like contrastive learning and knowledge distillation to improve performance and generalization across diverse audio datasets. This field is significant due to its potential to enhance numerous applications, including speech recognition, music information retrieval, and even mental health assessment, by enabling more accurate and efficient audio analysis.
Papers
Similar but Faster: Manipulation of Tempo in Music Audio Embeddings for Tempo Prediction and Search
Matthew C. McCallum, Florian Henkel, Jaehun Kim, Samuel E. Sandberg, Matthew E. P. Davies
Tempo estimation as fully self-supervised binary classification
Florian Henkel, Jaehun Kim, Matthew C. McCallum, Samuel E. Sandberg, Matthew E. P. Davies
On the Effect of Data-Augmentation on Local Embedding Properties in the Contrastive Learning of Music Audio Representations
Matthew C. McCallum, Matthew E. P. Davies, Florian Henkel, Jaehun Kim, Samuel E. Sandberg