Dataset Mention

Dataset mention, encompassing the identification and analysis of datasets within scientific literature and other contexts, aims to improve data discoverability and reproducibility. Current research focuses on automated dataset mention extraction using techniques like Bi-LSTM-CRF neural networks and large language models (LLMs) to enhance metadata and facilitate data linking across different sources. This work is crucial for tracking dataset usage, improving data quality assessment (including label design and class balance), and ultimately accelerating scientific progress by fostering better data management and reuse.

Papers