Real World Dataset
Real-world datasets are crucial for training and evaluating machine learning models, as idealized datasets often fail to capture the complexity and variability of real-world scenarios. Current research focuses on creating diverse datasets for various applications, including anomaly detection, robotic manipulation, and medical image analysis, often incorporating techniques like data augmentation and active learning to improve model robustness and generalization. The availability of high-quality, realistic datasets is essential for advancing the field and ensuring the reliable deployment of machine learning models in practical settings across numerous domains.
Papers
PokeFlex: Towards a Real-World Dataset of Deformable Objects for Robotic Manipulation
Jan Obrist, Miguel Zamora, Hehui Zheng, Juan Zarate, Robert K. Katzschmann, Stelian Coros
Generative Pre-trained Ranking Model with Over-parameterization at Web-Scale (Extended Abstract)
Yuchen Li, Haoyi Xiong, Linghe Kong, Jiang Bian, Shuaiqiang Wang, Guihai Chen, Dawei Yin
Revisiting the Performance of Deep Learning-Based Vulnerability Detection on Realistic Datasets
Partha Chakraborty, Krishna Kanth Arumugam, Mahmoud Alfadel, Meiyappan Nagappan, Shane McIntosh
FairJob: A Real-World Dataset for Fairness in Online Systems
Mariia Vladimirova, Federico Pavone, Eustache Diemert