Data Set
Datasets are crucial for training and evaluating machine learning models, particularly in areas like natural language processing, computer vision, and audio analysis. Current research emphasizes creating diverse and high-quality datasets addressing specific challenges, such as data imbalance, cross-lingual inconsistencies, and the need for realistic representations of real-world scenarios. This involves developing novel annotation techniques, incorporating multiple data modalities (e.g., text, images, audio), and employing various model architectures (e.g., transformers, convolutional neural networks) for analysis and benchmark creation. The availability of well-designed datasets directly impacts the development of robust and reliable machine learning models, ultimately advancing scientific understanding and improving practical applications across numerous fields.
Papers
MetaphorShare: A Dynamic Collaborative Repository of Open Metaphor Datasets
Joanne Boisson, Arif Mehmood, Jose Camacho-Collados
Pixel-aligned RGB-NIR Stereo Imaging and Dataset for Robot Vision
Jinnyeong Kim, Seung-Hwan Baek
Graph Neural Network for Cerebral Blood Flow Prediction With Clinical Datasets
Seungyeon Kim, Wheesung Lee, Sung-Ho Ahn, Do-Eun Lee, Tae-Rin Lee
Pre-training for Action Recognition with Automatically Generated Fractal Datasets
Davyd Svyezhentsev, George Retsinas, Petros Maragos
in-Car Biometrics (iCarB) Datasets for Driver Recognition: Face, Fingerprint, and Voice
Vedrana Krivokuca Hahn, Jeremy Maceiras, Alain Komaty, Philip Abbet, Sebastien Marcel
DiagramQG: A Dataset for Generating Concept-Focused Questions from Diagrams
Xinyu Zhang, Lingling Zhang, Yanrui Wu, Muye Huang, Wenjun Wu, Bo Li, Shaowei Wang, Jun Liu
Event-based Spiking Neural Networks for Object Detection: A Review of Datasets, Architectures, Learning Rules, and Implementation
Craig Iaboni, Pramod Abichandani
Brain-like emergent properties in deep networks: impact of network architecture, datasets and training
Niranjan Rajesh, Georgin Jacob, SP Arun
Oriented histogram-based vector field embedding for characterizing 4D CT data sets in radiotherapy
Frederic Madesta, Lukas Wimmert, Tobias Gauer, René Werner, Thilo Sentker
DP-CDA: An Algorithm for Enhanced Privacy Preservation in Dataset Synthesis Through Randomized Mixing
Utsab Saha, Tanvir Muntakim Tonoy, Hafiz Imtiaz
A Dataset for Evaluating Online Anomaly Detection Approaches for Discrete Multivariate Time Series
Lucas Correia, Jan-Christoph Goos, Thomas Bäck, Anna V. Kononova
Dressing the Imagination: A Dataset for AI-Powered Translation of Text into Fashion Outfits and A Novel KAN Adapter for Enhanced Feature Adaptation
Gayatri Deshmukh, Somsubhra De, Chirag Sehgal, Jishu Sen Gupta, Sparsh Mittal