Data Set
Datasets are crucial for training and evaluating machine learning models, particularly in areas like natural language processing, computer vision, and audio analysis. Current research emphasizes creating diverse and high-quality datasets addressing specific challenges, such as data imbalance, cross-lingual inconsistencies, and the need for realistic representations of real-world scenarios. This involves developing novel annotation techniques, incorporating multiple data modalities (e.g., text, images, audio), and employing various model architectures (e.g., transformers, convolutional neural networks) for analysis and benchmark creation. The availability of well-designed datasets directly impacts the development of robust and reliable machine learning models, ultimately advancing scientific understanding and improving practical applications across numerous fields.
Papers
Vibravox: A Dataset of French Speech Captured with Body-conduction Audio Sensors
Julien Hauret, Malo Olivier, Thomas Joubaud, Christophe Langrenne, Sarah Poirée, Véronique Zimpfer, Éric Bavu
FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models
Pengxiang Li, Zhi Gao, Bofei Zhang, Tao Yuan, Yuwei Wu, Mehrtash Harandi, Yunde Jia, Song-Chun Zhu, Qing Li
XTraffic: A Dataset Where Traffic Meets Incidents with Explainability and More
Xiaochuan Gou, Ziyue Li, Tian Lan, Junpeng Lin, Zhishuai Li, Bingyu Zhao, Chen Zhang, Di Wang, Xiangliang Zhang
MSEval: A Dataset for Material Selection in Conceptual Design to Evaluate Algorithmic Models
Yash Patawari Jain, Daniele Grandi, Allin Groom, Brandon Cramer, Christopher McComb
SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers
Shraman Pramanick, Rama Chellappa, Subhashini Venugopalan
WSESeg: Introducing a Dataset for the Segmentation of Winter Sports Equipment with a Baseline for Interactive Segmentation
Robin Schön, Daniel Kienzle, Rainer Lienhart
WayveScenes101: A Dataset and Benchmark for Novel View Synthesis in Autonomous Driving
Jannik Zürn, Paul Gladkov, Sofía Dudas, Fergal Cotter, Sofi Toteva, Jamie Shotton, Vasiliki Simaiaki, Nikhil Mohan
The AEIF Data Collection: A Dataset for Infrastructure-Supported Perception Research with Focus on Public Transportation
Marcel Vosshans, Alexander Baumann, Matthias Drueppel, Omar Ait-Aider, Ralf Woerner, Youcef Mezouar, Thao Dang, Markus Enzweiler
ICD Codes are Insufficient to Create Datasets for Machine Learning: An Evaluation Using All of Us Data for Coccidioidomycosis and Myocardial Infarction
Abigail E. Whitlock, Gondy Leroy, Fariba M. Donovan, John N. Galgiani
FUNAvg: Federated Uncertainty Weighted Averaging for Datasets with Diverse Labels
Malte Tölle, Fernando Navarro, Sebastian Eble, Ivo Wolf, Bjoern Menze, Sandy Engelhardt
Towards Enhancing Coherence in Extractive Summarization: Dataset and Experiments with LLMs
Mihir Parmar, Hanieh Deilamsalehy, Franck Dernoncourt, Seunghyun Yoon, Ryan A. Rossi, Trung Bui
SH17: A Dataset for Human Safety and Personal Protective Equipment Detection in Manufacturing Industry
Hafiz Mughees Ahmad, Afshin Rahimi
LearnerVoice: A Dataset of Non-Native English Learners' Spontaneous Speech
Haechan Kim, Junho Myung, Seoyoung Kim, Sungpah Lee, Dongyeop Kang, Juho Kim