Paper ID: 2403.10380 • Published Mar 15, 2024
BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics
TL;DR
Get AI-generated summaries with premium
Get AI-generated summaries with premium
Deep learning (DL) has greatly advanced audio classification, yet the field
is limited by the scarcity of large-scale benchmark datasets that have
propelled progress in other domains. While AudioSet is a pivotal step to bridge
this gap as a universal-domain dataset, its restricted accessibility and
limited range of evaluation use cases challenge its role as the sole resource.
Therefore, we introduce \texttt{BirdSet}, a large-scale benchmark dataset for
audio classification focusing on avian bioacoustics. \texttt{BirdSet} surpasses
AudioSet with over 6,800 recording hours~(\uparrow\!17\%) from nearly 10,000
classes~(\uparrow\!18\times) for training and more than 400
hours~(\uparrow\!7\times) across eight strongly labeled evaluation datasets.
It serves as a versatile resource for use cases such as multi-label
classification, covariate shift or self-supervised learning. We benchmark six
well-known DL models in multi-label classification across three distinct
training scenarios and outline further evaluation use cases in audio
classification. We host our dataset on Hugging Face for easy accessibility and
offer an extensive codebase to reproduce our results.