Data Set
Datasets are crucial for training and evaluating machine learning models, particularly in areas like natural language processing, computer vision, and audio analysis. Current research emphasizes creating diverse and high-quality datasets addressing specific challenges, such as data imbalance, cross-lingual inconsistencies, and the need for realistic representations of real-world scenarios. This involves developing novel annotation techniques, incorporating multiple data modalities (e.g., text, images, audio), and employing various model architectures (e.g., transformers, convolutional neural networks) for analysis and benchmark creation. The availability of well-designed datasets directly impacts the development of robust and reliable machine learning models, ultimately advancing scientific understanding and improving practical applications across numerous fields.
Papers
TaskComplexity: A Dataset for Task Complexity Classification with In-Context Learning, FLAN-T5 and GPT-4o Benchmarks
Areeg Fahad Rasheed, M. Zarkoosh, Safa F. Abbas, Sana Sabah Al-Azzawi
CycleCrash: A Dataset of Bicycle Collision Videos for Collision Prediction and Analysis
Nishq Poorav Desai, Ali Etemad, Michael Greenspan
A Systematic Review of NLP for Dementia- Tasks, Datasets and Opportunities
Lotem Peled-Cohen, Roi Reichart
Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models
Xin Li, Weize Chen, Qizhi Chu, Haopeng Li, Zhaojun Sun, Ran Li, Chen Qian, Yiwei Wei, Zhiyuan Liu, Chuan Shi, Maosong Sun, Cheng Yang
LML: Language Model Learning a Dataset for Data-Augmented Prediction
Praneeth Vadlapati
Excavating in the Wild: The GOOSE-Ex Dataset for Semantic Segmentation
Raphael Hagmanns, Peter Mortimer, Miguel Granero, Thorsten Luettel, Janko Petereit
Relighting from a Single Image: Datasets and Deep Intrinsic-based Architecture
Yixiong Yang, Hassan Ahmed Sial, Ramon Baldrich, Maria Vanrell
Off to new Shores: A Dataset & Benchmark for (near-)coastal Flood Inundation Forecasting
Brandon Victor, Mathilde Letard, Peter Naylor, Karim Douch, Nicolas Longépé, Zhen He, Patrick Ebel
MMDVS-LF: A Multi-Modal Dynamic-Vision-Sensor Line Following Dataset
Felix Resch, Mónika Farsang, Radu Grosu
Geospatial Road Cycling Race Results Data Set
Bram Janssens, Luca Pappalardo, Jelle De Bock, Matthias Bogaert, Steven Verstockt
Dataset Distillation-based Hybrid Federated Learning on Non-IID Data
Xiufang Shi, Wei Zhang, Mincheng Wu, Guangyi Liu, Zhenyu Wen, Shibo He, Tejal Shah, Rajiv Ranjan
AIM 2024 Sparse Neural Rendering Challenge: Dataset and Benchmark
Michal Nazarczuk, Thomas Tanay, Sibi Catley-Chandar, Richard Shaw, Radu Timofte, Eduardo Pérez-Pellitero
Designing Pre-training Datasets from Unlabeled Data for EEG Classification with Transformers
Tim Bary, Benoit Macq
MTP: A Dataset for Multi-Modal Turning Points in Casual Conversations
Gia-Bao Dinh Ho, Chang Wei Tan, Zahra Zamanzadeh Darban, Mahsa Salehi, Gholamreza Haffari, Wray Buntine