Data Set
Datasets are crucial for training and evaluating machine learning models, particularly in areas like natural language processing, computer vision, and audio analysis. Current research emphasizes creating diverse and high-quality datasets addressing specific challenges, such as data imbalance, cross-lingual inconsistencies, and the need for realistic representations of real-world scenarios. This involves developing novel annotation techniques, incorporating multiple data modalities (e.g., text, images, audio), and employing various model architectures (e.g., transformers, convolutional neural networks) for analysis and benchmark creation. The availability of well-designed datasets directly impacts the development of robust and reliable machine learning models, ultimately advancing scientific understanding and improving practical applications across numerous fields.
Papers
Chat2Scenario: Scenario Extraction From Dataset Through Utilization of Large Language Model
Yongqi Zhao, Wenbo Xiao, Tomislav Mihalj, Jia Hu, Arno Eichberger
3D Human Pose Estimation with Occlusions: Introducing BlendMimic3D Dataset and GCN Refinement
Filipa Lino, Carlos Santiago, Manuel Marques
Rethinking Model Prototyping through the MedMNIST+ Dataset Collection
Sebastian Doerrich, Francesco Di Salvo, Julius Brockmann, Christian Ledig
Better Synthetic Data by Retrieving and Transforming Existing Datasets
Saumya Gandhi, Ritu Gala, Vijay Viswanathan, Tongshuang Wu, Graham Neubig
TeamTrack: A Dataset for Multi-Sport Multi-Object Tracking in Full-pitch Videos
Atom Scott, Ikuma Uchida, Ning Ding, Rikuhei Umemoto, Rory Bunker, Ren Kobayashi, Takeshi Koyama, Masaki Onishi, Yoshinari Kameda, Keisuke Fujii
Cross-cultural Inspiration Detection and Analysis in Real and LLM-generated Social Media Data
Oana Ignat, Gayathri Ganesh Lakshmy, Rada Mihalcea
PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering
Yihao Ding, Kaixuan Ren, Jiabin Huang, Siwen Luo, Soyeon Caren Han
iTBLS: A Dataset of Interactive Conversations Over Tabular Information
Anirudh Sundar, Christopher Richardson, William Gay, Larry Heck