Data Set
Datasets are crucial for training and evaluating machine learning models, particularly in areas like natural language processing, computer vision, and audio analysis. Current research emphasizes creating diverse and high-quality datasets addressing specific challenges, such as data imbalance, cross-lingual inconsistencies, and the need for realistic representations of real-world scenarios. This involves developing novel annotation techniques, incorporating multiple data modalities (e.g., text, images, audio), and employing various model architectures (e.g., transformers, convolutional neural networks) for analysis and benchmark creation. The availability of well-designed datasets directly impacts the development of robust and reliable machine learning models, ultimately advancing scientific understanding and improving practical applications across numerous fields.
Papers
GeoDE: a Geographically Diverse Evaluation Dataset for Object Recognition
Vikram V. Ramaswamy, Sing Yu Lin, Dora Zhao, Aaron B. Adcock, Laurens van der Maaten, Deepti Ghadiyaram, Olga Russakovsky
Impact, Attention, Influence: Early Assessment of Autonomous Driving Datasets
Daniel Bogdoll, Jonas Hendl, Felix Schreyer, Nishanth Gowda, Michael Färber, J. Marius Zöllner
EndoBoost: a plug-and-play module for false positive suppression during computer-aided polyp detection in real-world colonoscopy (with dataset)
Haoran Wang, Yan Zhu, Wenzheng Qin, Yizhe Zhang, Pinghong Zhou, Quanlin Li, Shuo Wang, Zhijian Song
CinPatent: Datasets for Patent Classification
Minh-Tien Nguyen, Nhung Bui, Manh Tran-Tien, Linh Le, Huy-The Vu
Graph Neural Networks in Computer Vision -- Architectures, Datasets and Common Approaches
Maciej Krzywda, Szymon Łukasik, Amir H. Gandomi
IndicMT Eval: A Dataset to Meta-Evaluate Machine Translation metrics for Indian Languages
Ananya B. Sai, Vignesh Nagarajan, Tanay Dixit, Raj Dabre, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra
On The Relevance Of The Differences Between HRTF Measurement Setups For Machine Learning
Johan Pauwels, Lorenzo Picinali
Surround-view Fisheye BEV-Perception for Valet Parking: Dataset, Baseline and Distortion-insensitive Multi-task Framework
Zizhang Wu, Yuanzhu Gan, Xianzhi Li, Yunzhe Wu, Xiaoquan Wang, Tianhao Xu, Fan Wang