Data Set
Datasets are crucial for training and evaluating machine learning models, particularly in areas like natural language processing, computer vision, and audio analysis. Current research emphasizes creating diverse and high-quality datasets addressing specific challenges, such as data imbalance, cross-lingual inconsistencies, and the need for realistic representations of real-world scenarios. This involves developing novel annotation techniques, incorporating multiple data modalities (e.g., text, images, audio), and employing various model architectures (e.g., transformers, convolutional neural networks) for analysis and benchmark creation. The availability of well-designed datasets directly impacts the development of robust and reliable machine learning models, ultimately advancing scientific understanding and improving practical applications across numerous fields.
Papers
DIOR: Dataset for Indoor-Outdoor Reidentification -- Long Range 3D/2D Skeleton Gait Collection Pipeline, Semi-Automated Gait Keypoint Labeling and Baseline Evaluation Methods
Yuyang Chen, Praveen Raj Masilamani, Bhavin Jawade, Srirangaraj Setlur, Karthik Dantu
The Cambridge Law Corpus: A Dataset for Legal AI Research
Andreas Östling, Holli Sargeant, Huiyuan Xie, Ludwig Bull, Alexander Terenin, Leif Jonsson, Måns Magnusson, Felix Steffek
Towards Answering Health-related Questions from Medical Videos: Datasets and Approaches
Deepak Gupta, Kush Attal, Dina Demner-Fushman
Choice-75: A Dataset on Decision Branching in Script Learning
Zhaoyi Joey Hou, Li Zhang, Chris Callison-Burch
Measuring the Quality of Text-to-Video Model Outputs: Metrics and Dataset
Iya Chivileva, Philip Lynch, Tomas E. Ward, Alan F. Smeaton
M3Dsynth: A dataset of medical 3D images with AI-generated local manipulations
Giada Zingarini, Davide Cozzolino, Riccardo Corvi, Giovanni Poggi, Luisa Verdoliva
Dhan-Shomadhan: A Dataset of Rice Leaf Disease Classification for Bangladeshi Local Rice
Md. Fahad Hossain
Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and Reasoning
Enna Sachdeva, Nakul Agarwal, Suhas Chundi, Sean Roelofs, Jiachen Li, Mykel Kochenderfer, Chiho Choi, Behzad Dariush
Flows for Flows: Morphing one Dataset into another with Maximum Likelihood Estimation
Tobias Golling, Samuel Klein, Radha Mastandrea, Benjamin Nachman, John Andrew Raine