Data Set
Datasets are crucial for training and evaluating machine learning models, particularly in areas like natural language processing, computer vision, and audio analysis. Current research emphasizes creating diverse and high-quality datasets addressing specific challenges, such as data imbalance, cross-lingual inconsistencies, and the need for realistic representations of real-world scenarios. This involves developing novel annotation techniques, incorporating multiple data modalities (e.g., text, images, audio), and employing various model architectures (e.g., transformers, convolutional neural networks) for analysis and benchmark creation. The availability of well-designed datasets directly impacts the development of robust and reliable machine learning models, ultimately advancing scientific understanding and improving practical applications across numerous fields.
Papers
The Interstate-24 3D Dataset: a new benchmark for 3D multi-camera vehicle tracking
Derek Gloudemans, Yanbing Wang, Gracie Gumm, William Barbour, Daniel B. Work
Generating tabular datasets under differential privacy
Gianluca Truda
SalesBot 2.0: A Human-Like Intent-Guided Chit-Chat Dataset
Wen-Yu Chang, Yun-Nung Chen
BridgeData V2: A Dataset for Robot Learning at Scale
Homer Walke, Kevin Black, Abraham Lee, Moo Jin Kim, Max Du, Chongyi Zheng, Tony Zhao, Philippe Hansen-Estruch, Quan Vuong, Andre He, Vivek Myers, Kuan Fang, Chelsea Finn, Sergey Levine
Beyond Document Page Classification: Design, Datasets, and Challenges
Jordy Van Landeghem, Sanket Biswas, Matthew B. Blaschko, Marie-Francine Moens
Imputing Brain Measurements Across Data Sets via Graph Neural Networks
Yixin Wang, Wei Peng, Susan F. Tapert, Qingyu Zhao, Kilian M. Pohl
DatasetEquity: Are All Samples Created Equal? In The Quest For Equity Within Datasets
Shubham Shrivastava, Xianling Zhang, Sushruth Nagesh, Armin Parchami
Breaking Language Barriers: A Question Answering Dataset for Hindi and Marathi
Maithili Sabane, Onkar Litake, Aman Chadha