Paper ID: 2402.10547

Learning Disentangled Audio Representations through Controlled Synthesis

Yusuf Brima, Ulf Krumnack, Simone Pika, Gunther Heidemann

This paper tackles the scarcity of benchmarking data in disentangled auditory representation learning. We introduce SynTone, a synthetic dataset with explicit ground truth explanatory factors for evaluating disentanglement techniques. Benchmarking state-of-the-art methods on SynTone highlights its utility for method evaluation. Our results underscore strengths and limitations in audio disentanglement, motivating future research.

Submitted: Feb 16, 2024