Data Set
Datasets are crucial for training and evaluating machine learning models, particularly in areas like natural language processing, computer vision, and audio analysis. Current research emphasizes creating diverse and high-quality datasets addressing specific challenges, such as data imbalance, cross-lingual inconsistencies, and the need for realistic representations of real-world scenarios. This involves developing novel annotation techniques, incorporating multiple data modalities (e.g., text, images, audio), and employing various model architectures (e.g., transformers, convolutional neural networks) for analysis and benchmark creation. The availability of well-designed datasets directly impacts the development of robust and reliable machine learning models, ultimately advancing scientific understanding and improving practical applications across numerous fields.
Papers
MultiGripperGrasp: A Dataset for Robotic Grasping from Parallel Jaw Grippers to Dexterous Hands
Luis Felipe Casas, Ninad Khargonkar, Balakrishnan Prabhakaran, Yu Xiang
Meta-Cognitive Analysis: Evaluating Declarative and Procedural Knowledge in Datasets and Large Language Models
Zhuoqun Li, Hongyu Lin, Yaojie Lu, Hao Xiang, Xianpei Han, Le Sun
The Full-scale Assembly Simulation Testbed (FAST) Dataset
Alec G. Moore, Tiffany D. Do, Nayan N. Chawla, Antonia Jimenez Iriarte, Ryan P. McMahan
MoleculeQA: A Dataset to Evaluate Factual Accuracy in Molecular Comprehension
Xingyu Lu, He Cao, Zijing Liu, Shengyuan Bai, Leqing Chen, Yuan Yao, Hai-Tao Zheng, Yu Li
DADIT: A Dataset for Demographic Classification of Italian Twitter Users and a Comparison of Prediction Methods
Lorenzo Lupo, Paul Bose, Mahyar Habibi, Dirk Hovy, Carlo Schwarz
SeeGULL Multilingual: a Dataset of Geo-Culturally Situated Stereotypes
Mukul Bhutani, Kevin Robinson, Vinodkumar Prabhakaran, Shachi Dave, Sunipa Dev
What is different between these datasets?
Varun Babbar, Zhicheng Guo, Cynthia Rudin
Benchmarking Micro-action Recognition: Dataset, Methods, and Applications
Dan Guo, Kun Li, Bin Hu, Yan Zhang, Meng Wang
A dataset of over one thousand computed tomography scans of battery cells
Amariah Condon, Bailey Buscarino, Eric Moch, William J. Sehnert, Owen Miles, Patrick K. Herring, Peter M. Attia
Birbal: An efficient 7B instruct-model fine-tuned with curated datasets
Ashvini Kumar Jindal, Pawan Kumar Rajpoot, Ankur Parikh
REAL-Colon: A dataset for developing real-world AI applications in colonoscopy
Carlo Biffi, Giulio Antonelli, Sebastian Bernhofer, Cesare Hassan, Daizen Hirata, Mineo Iwatate, Andreas Maieron, Pietro Salvagnini, Andrea Cherubini