Data Set
Datasets are crucial for training and evaluating machine learning models, particularly in areas like natural language processing, computer vision, and audio analysis. Current research emphasizes creating diverse and high-quality datasets addressing specific challenges, such as data imbalance, cross-lingual inconsistencies, and the need for realistic representations of real-world scenarios. This involves developing novel annotation techniques, incorporating multiple data modalities (e.g., text, images, audio), and employing various model architectures (e.g., transformers, convolutional neural networks) for analysis and benchmark creation. The availability of well-designed datasets directly impacts the development of robust and reliable machine learning models, ultimately advancing scientific understanding and improving practical applications across numerous fields.
Papers
MultiverSeg: Scalable Interactive Segmentation of Biomedical Imaging Datasets with In-Context Guidance
Hallee E. Wong, Jose Javier Gonzalez Ortiz, John Guttag, Adrian V. Dalca
Automatic Spectral Calibration of Hyperspectral Images:Method, Dataset and Benchmark
Zhuoran Du, Shaodi You, Cheng Cheng, Shikui Wei
Cherry-Picking in Time Series Forecasting: How to Select Datasets to Make Your Model Shine
Luis Roque, Carlos Soares, Vitor Cerqueira, Luis Torgo
Label Errors in the Tobacco3482 Dataset
Gordon Lim, Stefan Larson, Kevin Leach
Uchaguzi-2022: A Dataset of Citizen Reports on the 2022 Kenyan Election
Roberto Mondini, Neema Kotonya, Robert L. Logan IV, Elizabeth M Olson, Angela Oduor Lungati, Daniel Duke Odongo, Tim Ombasa, Hemank Lamba, Aoife Cahill, Joel R. Tetreault, Alejandro Jaimes
Domain Generalization in Autonomous Driving: Evaluating YOLOv8s, RT-DETR, and YOLO-NAS with the ROAD-Almaty Dataset
Madiyar Alimov, Temirlan Meiramkhanov
Expanded Comprehensive Robotic Cholecystectomy Dataset (CRCD)
Ki-Hwan Oh, Leonardo Borgioli, Alberto Mangano, Valentina Valle, Marco Di Pangrazio, Francesco Toti, Gioia Pozza, Luciano Ambrosini, Alvaro Ducas, Miloš Žefran, Liaohai Chen, Pier Cristoforo Giulianotti
Beyond Dataset Creation: Critical View of Annotation Variation and Bias Probing of a Dataset for Online Radical Content Detection
Arij Riabi, Virginie Mouilleron, Menel Mahamdi, Wissam Antoun, Djamé Seddah
Oriented Tiny Object Detection: A Dataset, Benchmark, and Dynamic Unbiased Learning
Chang Xu, Ruixiang Zhang, Wen Yang, Haoran Zhu, Fang Xu, Jian Ding, Gui-Song Xia
TS-SatFire: A Multi-Task Satellite Image Time-Series Dataset for Wildfire Detection and Prediction
Yu Zhao, Sebastian Gerard, Yifang Ban
Copy-Move Detection in Optical Microscopy: A Segmentation Network and A Dataset
Hao-Chiang Shao, Yuan-Rong Liao, Tse-Yu Tseng, Yen-Liang Chuo, Fong-Yi Lin
Exploring the Frontiers of Animation Video Generation in the Sora Era: Method, Dataset and Benchmark
Yudong Jiang, Baohan Xu, Siqian Yang, Mingyu Yin, Jing Liu, Chao Xu, Siqi Wang, Yidi Wu, Bingwen Zhu, Jixuan Xu, Yue Zhang, Jinlong Hou, Huyang Sun
RETQA: A Large-Scale Open-Domain Tabular Question Answering Dataset for Real Estate Sector
Zhensheng Wang, Wenmian Yang, Kun Zhou, Yiquan Zhang, Weijia Jia