Data Set
Datasets are crucial for training and evaluating machine learning models, particularly in areas like natural language processing, computer vision, and audio analysis. Current research emphasizes creating diverse and high-quality datasets addressing specific challenges, such as data imbalance, cross-lingual inconsistencies, and the need for realistic representations of real-world scenarios. This involves developing novel annotation techniques, incorporating multiple data modalities (e.g., text, images, audio), and employing various model architectures (e.g., transformers, convolutional neural networks) for analysis and benchmark creation. The availability of well-designed datasets directly impacts the development of robust and reliable machine learning models, ultimately advancing scientific understanding and improving practical applications across numerous fields.
Papers
MSEval: A Dataset for Material Selection in Conceptual Design to Evaluate Algorithmic Models
Yash Patawari Jain, Daniele Grandi, Allin Groom, Brandon Cramer, Christopher McComb
SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers
Shraman Pramanick, Rama Chellappa, Subhashini Venugopalan
WSESeg: Introducing a Dataset for the Segmentation of Winter Sports Equipment with a Baseline for Interactive Segmentation
Robin Schön, Daniel Kienzle, Rainer Lienhart
WayveScenes101: A Dataset and Benchmark for Novel View Synthesis in Autonomous Driving
Jannik Zürn, Paul Gladkov, Sofía Dudas, Fergal Cotter, Sofi Toteva, Jamie Shotton, Vasiliki Simaiaki, Nikhil Mohan
The AEIF Data Collection: A Dataset for Infrastructure-Supported Perception Research with Focus on Public Transportation
Marcel Vosshans, Alexander Baumann, Matthias Drueppel, Omar Ait-Aider, Ralf Woerner, Youcef Mezouar, Thao Dang, Markus Enzweiler
ICD Codes are Insufficient to Create Datasets for Machine Learning: An Evaluation Using All of Us Data for Coccidioidomycosis and Myocardial Infarction
Abigail E. Whitlock, Gondy Leroy, Fariba M. Donovan, John N. Galgiani
FUNAvg: Federated Uncertainty Weighted Averaging for Datasets with Diverse Labels
Malte Tölle, Fernando Navarro, Sebastian Eble, Ivo Wolf, Bjoern Menze, Sandy Engelhardt
Towards Enhancing Coherence in Extractive Summarization: Dataset and Experiments with LLMs
Mihir Parmar, Hanieh Deilamsalehy, Franck Dernoncourt, Seunghyun Yoon, Ryan A. Rossi, Trung Bui
SH17: A Dataset for Human Safety and Personal Protective Equipment Detection in Manufacturing Industry
Hafiz Mughees Ahmad, Afshin Rahimi
LearnerVoice: A Dataset of Non-Native English Learners' Spontaneous Speech
Haechan Kim, Junho Myung, Seoyoung Kim, Sungpah Lee, Dongyeop Kang, Juho Kim
BVI-RLV: A Fully Registered Dataset and Benchmarks for Low-Light Video Enhancement
Ruirui Lin, Nantheera Anantrasirichai, Guoxi Huang, Joanne Lin, Qi Sun, Alexandra Malyugina, David R Bull
Celeb-FBI: A Benchmark Dataset on Human Full Body Images and Age, Gender, Height and Weight Estimation using Deep Learning Approach
Pronay Debnath, Usafa Akther Rifa, Busra Kamal Rafa, Ali Haider Talukder Akib, Md. Aminur Rahman