Data Set
Datasets are crucial for training and evaluating machine learning models, particularly in areas like natural language processing, computer vision, and audio analysis. Current research emphasizes creating diverse and high-quality datasets addressing specific challenges, such as data imbalance, cross-lingual inconsistencies, and the need for realistic representations of real-world scenarios. This involves developing novel annotation techniques, incorporating multiple data modalities (e.g., text, images, audio), and employing various model architectures (e.g., transformers, convolutional neural networks) for analysis and benchmark creation. The availability of well-designed datasets directly impacts the development of robust and reliable machine learning models, ultimately advancing scientific understanding and improving practical applications across numerous fields.
Papers
A Multi-loudspeaker Binaural Room Impulse Response Dataset with High-Resolution Translational and Rotational Head Coordinates in a Listening Room
Yue Qiao, Ryan Miguel Gonzales, Edgar Choueiri
The POLAR Traverse Dataset: A Dataset of Stereo Camera Images Simulating Traverses across Lunar Polar Terrain under Extreme Lighting Conditions
Margaret Hansen, Uland Wong, Terrence Fong
End-To-End Underwater Video Enhancement: Dataset and Model
Dazhao Du, Enhan Li, Lingyu Si, Fanjiang Xu, Jianwei Niu
MultiGripperGrasp: A Dataset for Robotic Grasping from Parallel Jaw Grippers to Dexterous Hands
Luis Felipe Casas, Ninad Khargonkar, Balakrishnan Prabhakaran, Yu Xiang
Meta-Cognitive Analysis: Evaluating Declarative and Procedural Knowledge in Datasets and Large Language Models
Zhuoqun Li, Hongyu Lin, Yaojie Lu, Hao Xiang, Xianpei Han, Le Sun
The Full-scale Assembly Simulation Testbed (FAST) Dataset
Alec G. Moore, Tiffany D. Do, Nayan N. Chawla, Antonia Jimenez Iriarte, Ryan P. McMahan
MoleculeQA: A Dataset to Evaluate Factual Accuracy in Molecular Comprehension
Xingyu Lu, He Cao, Zijing Liu, Shengyuan Bai, Leqing Chen, Yuan Yao, Hai-Tao Zheng, Yu Li
DADIT: A Dataset for Demographic Classification of Italian Twitter Users and a Comparison of Prediction Methods
Lorenzo Lupo, Paul Bose, Mahyar Habibi, Dirk Hovy, Carlo Schwarz
SeeGULL Multilingual: a Dataset of Geo-Culturally Situated Stereotypes
Mukul Bhutani, Kevin Robinson, Vinodkumar Prabhakaran, Shachi Dave, Sunipa Dev
What is different between these datasets?
Varun Babbar, Zhicheng Guo, Cynthia Rudin
Benchmarking Micro-action Recognition: Dataset, Methods, and Applications
Dan Guo, Kun Li, Bin Hu, Yan Zhang, Meng Wang