Data Set
Datasets are crucial for training and evaluating machine learning models, particularly in areas like natural language processing, computer vision, and audio analysis. Current research emphasizes creating diverse and high-quality datasets addressing specific challenges, such as data imbalance, cross-lingual inconsistencies, and the need for realistic representations of real-world scenarios. This involves developing novel annotation techniques, incorporating multiple data modalities (e.g., text, images, audio), and employing various model architectures (e.g., transformers, convolutional neural networks) for analysis and benchmark creation. The availability of well-designed datasets directly impacts the development of robust and reliable machine learning models, ultimately advancing scientific understanding and improving practical applications across numerous fields.
Papers
ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation
Ali Athar, Xueqing Deng, Liang-Chieh Chen
Explore Theory of Mind: Program-guided adversarial data generation for theory of mind reasoning
Melanie Sclar, Jane Yu, Maryam Fazel-Zarandi, Yulia Tsvetkov, Yonatan Bisk, Yejin Choi, Asli Celikyilmaz
MultiEYE: Dataset and Benchmark for OCT-Enhanced Retinal Disease Recognition from Fundus Images
Lehan Wang, Chongchong Qi, Chubin Ou, Lin An, Mei Jin, Xiangbin Kong, Xiaomeng Li
eCARLA-scenes: A synthetically generated dataset for event-based optical flow prediction
Jad Mansour, Hayat Rajani, Rafael Garcia, Nuno Gracias
How Certain are Uncertainty Estimates? Three Novel Earth Observation Datasets for Benchmarking Uncertainty Quantification in Machine Learning
Yuanyuan Wang, Qian Song, Dawood Wasif, Muhammad Shahzad, Christoph Koller, Jonathan Bamber, Xiao Xiang Zhu
A Pipeline and NIR-Enhanced Dataset for Parking Lot Segmentation
Shirin Qiam, Saipraneeth Devunuri, Lewis J. Lehe
Multi-cam Multi-map Visual Inertial Localization: System, Validation and Dataset
Fuzhang Han, Yufei Wei, Yanmei Jiao, Zhuqing Zhang, Yiyuan Pan, Wenjun Huang, Li Tang, Huan Yin, Xiaqing Ding, Rong Xiong, Yue Wang
HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing
Jinbin Bai, Wei Chow, Ling Yang, Xiangtai Li, Juncheng Li, Hanwang Zhang, Shuicheng Yan
SynFinTabs: A Dataset of Synthetic Financial Tables for Information and Table Extraction
Ethan Bradley, Muhammad Roman, Karen Rafferty, Barry Devereux
Hostility Detection in UK Politics: A Dataset on Online Abuse Targeting MPs
Mugdha Pandya, Mali Jin, Kalina Bontcheva, Diana Maynard
Measuring Bias of Web-filtered Text Datasets and Bias Propagation Through Training
Youssef Mansour, Reinhard Heckel
FLAME 3 Dataset: Unleashing the Power of Radiometric Thermal UAV Imagery for Wildfire Management
Bryce Hopkins, Leo ONeill, Michael Marinaccio, Eric Rowell, Russell Parsons, Sarah Flanary, Irtija Nazim, Carl Seielstad, Fatemeh Afghah
Hybrid-SQuAD: Hybrid Scholarly Question Answering Dataset
Tilahun Abedissa Taffa, Debayan Baneerje, Yaregal Assabie, Ricardo Usbeck
Interpretable Generalized Additive Models for Datasets with Missing Values
Hayden McTavish, Jon Donnelly, Margo Seltzer, Cynthia Rudin
Nemotron-CC: Transforming Common Crawl into a Refined Long-Horizon Pretraining Dataset
Dan Su, Kezhi Kong, Ying Lin, Joseph Jennings, Brandon Norick, Markus Kliegl, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro
Patent-CR: A Dataset for Patent Claim Revision
Lekang Jiang, Pascal A Scherz, Stephan Goetz
A comprehensive review of datasets and deep learning techniques for vision in Unmanned Surface Vehicles
Linh Trinh, Siegfried Mercelis, Ali Anwar
Linear stimulus reconstruction works on the KU Leuven audiovisual, gaze-controlled auditory attention decoding dataset
Simon Geirnaert, Iustina Rotaru, Tom Francart, Alexander Bertrand