Data Set
Datasets are crucial for training and evaluating machine learning models, particularly in areas like natural language processing, computer vision, and audio analysis. Current research emphasizes creating diverse and high-quality datasets addressing specific challenges, such as data imbalance, cross-lingual inconsistencies, and the need for realistic representations of real-world scenarios. This involves developing novel annotation techniques, incorporating multiple data modalities (e.g., text, images, audio), and employing various model architectures (e.g., transformers, convolutional neural networks) for analysis and benchmark creation. The availability of well-designed datasets directly impacts the development of robust and reliable machine learning models, ultimately advancing scientific understanding and improving practical applications across numerous fields.
Papers
GELDA: A generative language annotation framework to reveal visual biases in datasets
Krish Kabra, Kathleen M. Lewis, Guha Balakrishnan
DSText V2: A Comprehensive Video Text Spotting Dataset for Dense and Small Text
Weijia Wu, Yiming Zhang, Yefei He, Luoming Zhang, Zhenyu Lou, Hong Zhou, Xiang Bai
360Loc: A Dataset and Benchmark for Omnidirectional Visual Localization with Cross-device Queries
Huajian Huang, Changkun Liu, Yipeng Zhu, Hui Cheng, Tristan Braud, Sai-Kit Yeung
DUnE: Dataset for Unified Editing
Afra Feyza Akyürek, Eric Pan, Garry Kuwanto, Derry Wijaya
Tell2Design: A Dataset for Language-Guided Floor Plan Generation
Sicong Leng, Yang Zhou, Mohammed Haroon Dupty, Wee Sun Lee, Sam Conrad Joyce, Wei Lu
Fully Authentic Visual Question Answering Dataset from Online Communities
Chongyan Chen, Mengchen Liu, Noel Codella, Yunsheng Li, Lu Yuan, Danna Gurari
SniffyArt: The Dataset of Smelling Persons
Mathias Zinnen, Azhar Hussian, Hang Tran, Prathmesh Madhu, Andreas Maier, Vincent Christlein
Multi-Task Faces (MTF) Data Set: A Legally and Ethically Compliant Collection of Face Images for Various Classification Tasks
Rami Haffar, David Sánchez, Josep Domingo-Ferrer
A Large-Scale Car Parts (LSCP) Dataset for Lightweight Fine-Grained Detection
Wang Jie, Zhong Yilin, Cao Qianqian