Data Set
Datasets are crucial for training and evaluating machine learning models, particularly in areas like natural language processing, computer vision, and audio analysis. Current research emphasizes creating diverse and high-quality datasets addressing specific challenges, such as data imbalance, cross-lingual inconsistencies, and the need for realistic representations of real-world scenarios. This involves developing novel annotation techniques, incorporating multiple data modalities (e.g., text, images, audio), and employing various model architectures (e.g., transformers, convolutional neural networks) for analysis and benchmark creation. The availability of well-designed datasets directly impacts the development of robust and reliable machine learning models, ultimately advancing scientific understanding and improving practical applications across numerous fields.
Papers
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
Sihan Chen, Xingjian He, Longteng Guo, Xinxin Zhu, Weining Wang, Jinhui Tang, Jing Liu
LED: A Dataset for Life Event Extraction from Dialogs
Yi-Pei Chen, An-Zi Yen, Hen-Hsen Huang, Hideki Nakayama, Hsin-Hsi Chen
HandCT: hands-on computational dataset for X-Ray Computed Tomography and Machine-Learning
Emilien Valat, Loth Valat
Homogenizing Non-IID datasets via In-Distribution Knowledge Distillation for Decentralized Learning
Deepak Ravikumar, Gobinda Saha, Sai Aparna Aketi, Kaushik Roy
FrenchMedMCQA: A French Multiple-Choice Question Answering Dataset for Medical domain
Yanis Labrak, Adrien Bazoge, Richard Dufour, Mickael Rouvier, Emmanuel Morin, Béatrice Daille, Pierre-Antoine Gourraud
Multimodal Brain-Computer Interface for In-Vehicle Driver Cognitive Load Measurement: Dataset and Baselines
Prithila Angkan, Behnam Behinaein, Zunayed Mahmud, Anubhav Bhatti, Dirk Rodenburg, Paul Hungler, Ali Etemad
CILIATE: Towards Fairer Class-based Incremental Learning by Dataset and Training Refinement
Xuanqi Gao, Juan Zhai, Shiqing Ma, Chao Shen, Yufei Chen, Shiwei Wang
SCB-dataset: A Dataset for Detecting Student Classroom Behavior
Fan Yang
Personality-aware Human-centric Multimodal Reasoning: A New Task, Dataset and Baselines
Yaochen Zhu, Xiangqing Shen, Rui Xia
LARD -- Landing Approach Runway Detection -- Dataset for Vision Based Landing
Mélanie Ducoffe, Maxime Carrere, Léo Féliers, Adrien Gauffriau, Vincent Mussot, Claire Pagetti, Thierry Sammour