Data Set
Datasets are crucial for training and evaluating machine learning models, particularly in areas like natural language processing, computer vision, and audio analysis. Current research emphasizes creating diverse and high-quality datasets addressing specific challenges, such as data imbalance, cross-lingual inconsistencies, and the need for realistic representations of real-world scenarios. This involves developing novel annotation techniques, incorporating multiple data modalities (e.g., text, images, audio), and employing various model architectures (e.g., transformers, convolutional neural networks) for analysis and benchmark creation. The availability of well-designed datasets directly impacts the development of robust and reliable machine learning models, ultimately advancing scientific understanding and improving practical applications across numerous fields.
Papers
Building Floorspace in China: A Dataset and Learning Pipeline
Peter Egger, Susie Xi Rao, Sebastiano Papini
Linear CNNs Discover the Statistical Structure of the Dataset Using Only the Most Dominant Frequencies
Hannah Pinson, Joeri Lenaerts, Vincent Ginis
TRR360D: A dataset for 360 degree rotated rectangular box table detection
Wenxing Hu, Minglei Tong