Data Set
Datasets are crucial for training and evaluating machine learning models, particularly in areas like natural language processing, computer vision, and audio analysis. Current research emphasizes creating diverse and high-quality datasets addressing specific challenges, such as data imbalance, cross-lingual inconsistencies, and the need for realistic representations of real-world scenarios. This involves developing novel annotation techniques, incorporating multiple data modalities (e.g., text, images, audio), and employing various model architectures (e.g., transformers, convolutional neural networks) for analysis and benchmark creation. The availability of well-designed datasets directly impacts the development of robust and reliable machine learning models, ultimately advancing scientific understanding and improving practical applications across numerous fields.
Papers
A Dataset of Relighted 3D Interacting Hands
Gyeongsik Moon, Shunsuke Saito, Weipeng Xu, Rohan Joshi, Julia Buffalini, Harley Bellan, Nicholas Rosen, Jesse Richardson, Mallorie Mize, Philippe de Bree, Tomas Simon, Bo Peng, Shubham Garg, Kevyn McPhail, Takaaki Shiratori
IndustReal: A Dataset for Procedure Step Recognition Handling Execution Errors in Egocentric Videos in an Industrial-Like Setting
Tim J. Schoonbeek, Tim Houben, Hans Onvlee, Peter H. N. de With, Fons van der Sommen
MLFMF: Data Sets for Machine Learning for Mathematical Formalization
Andrej Bauer, Matej Petković, Ljupčo Todorovski
NoteChat: A Dataset of Synthetic Doctor-Patient Conversations Conditioned on Clinical Notes
Junda Wang, Zonghai Yao, Zhichao Yang, Huixue Zhou, Rumeng Li, Xun Wang, Yucheng Xu, Hong Yu
This is not a Dataset: A Large Negation Benchmark to Challenge Large Language Models
Iker García-Ferrero, Begoña Altuna, Javier Álvez, Itziar Gonzalez-Dios, German Rigau
VECHR: A Dataset for Explainable and Robust Classification of Vulnerability Type in the European Court of Human Rights
Shanshan Xu, Leon Staufer, T. Y. S. S Santosh, Oana Ichim, Corina Heri, Matthias Grabmair
A State-Vector Framework for Dataset Effects
Esmat Sahak, Zining Zhu, Frank Rudzicz
UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models
Yangyang Guo, Fangkai Jiao, Zhiqi Shen, Liqiang Nie, Mohan Kankanhalli
Intent Detection and Slot Filling for Home Assistants: Dataset and Analysis for Bangla and Sylheti
Fardin Ahsan Sakib, A H M Rezaul Karim, Saadat Hasan Khan, Md Mushfiqur Rahman