Data Set
Datasets are crucial for training and evaluating machine learning models, particularly in areas like natural language processing, computer vision, and audio analysis. Current research emphasizes creating diverse and high-quality datasets addressing specific challenges, such as data imbalance, cross-lingual inconsistencies, and the need for realistic representations of real-world scenarios. This involves developing novel annotation techniques, incorporating multiple data modalities (e.g., text, images, audio), and employing various model architectures (e.g., transformers, convolutional neural networks) for analysis and benchmark creation. The availability of well-designed datasets directly impacts the development of robust and reliable machine learning models, ultimately advancing scientific understanding and improving practical applications across numerous fields.
Papers
The Touch\'e23-ValueEval Dataset for Identifying Human Values behind Arguments
Nailia Mirzakhmedova, Johannes Kiesel, Milad Alshomary, Maximilian Heinrich, Nicolas Handke, Xiaoni Cai, Barriere Valentin, Doratossadat Dastgheib, Omid Ghahroodi, Mohammad Ali Sadraei, Ehsaneddin Asgari, Lea Kawaletz, Henning Wachsmuth, Benno Stein
Fisheye traffic data set of point center markers
Chung-I Huang, Wei-Yu Chen, Wei Jan Ko, Jih-Sheng Chang, Chen-Kai Sun, Hui Hung Yu, Fang-Pang Lin
ACL-Fig: A Dataset for Scientific Figure Classification
Zeba Karishma, Shaurya Rohatgi, Kavya Shrinivas Puranik, Jian Wu, C. Lee Giles
Towards Equitable Representation in Text-to-Image Synthesis Models with the Cross-Cultural Understanding Benchmark (CCUB) Dataset
Zhixuan Liu, Youeun Shin, Beverley-Claire Okogwu, Youngsik Yun, Lia Coleman, Peter Schaldenbrand, Jihie Kim, Jean Oh
Young Labeled Faces in the Wild (YLFW): A Dataset for Children Faces Recognition
Iurii Medvedev, Farhad Shadmand, Nuno Gonçalves
RxRx1: A Dataset for Evaluating Experimental Batch Correction Methods
Maciej Sypetkowski, Morteza Rezanejad, Saber Saberian, Oren Kraus, John Urbanik, James Taylor, Ben Mabey, Mason Victors, Jason Yosinski, Alborz Rezazadeh Sereshkeh, Imran Haque, Berton Earnshaw
Poses of People in Art: A Data Set for Human Pose Estimation in Digital Art History
Stefanie Schneider, Ricarda Vollmer
A Dataset of Kurdish (Sorani) Named Entities -- An Amendment to Kurdish-BLARK Named Entities
Sazan Salar, Hossein Hassani
SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images
Ryota Tanaka, Kyosuke Nishida, Kosuke Nishida, Taku Hasegawa, Itsumi Saito, Kuniko Saito