Large Scale Dataset
Large-scale datasets are crucial for training and evaluating advanced machine learning models across diverse scientific domains, driving progress in areas like computer vision, natural language processing, and genomics. Current research focuses on creating datasets for specific, challenging tasks, such as robust object detection in complex environments (e.g., underwater, cluttered scenes, low-light conditions), multimodal data integration (e.g., image-text, video-text), and handling long-range dependencies (e.g., in video grounding and temporal question answering). The availability of these high-quality, large datasets is essential for advancing model performance and enabling new applications in various fields, from autonomous driving and medical diagnosis to climate change modeling and personalized recommendations.
Papers
STAR: A First-Ever Dataset and A Large-Scale Benchmark for Scene Graph Generation in Large-Size Satellite Imagery
Yansheng Li, Linlin Wang, Tingzhu Wang, Xue Yang, Junwei Luo, Qi Wang, Youming Deng, Wenbin Wang, Xian Sun, Haifeng Li, Bo Dang, Yongjun Zhang, Yi Yu, Junchi Yan
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning
Alexander Nikulin, Ilya Zisman, Alexey Zemtsov, Viacheslav Sinii, Vladislav Kurenkov, Sergey Kolesnikov
An Open and Large-Scale Dataset for Multi-Modal Climate Change-aware Crop Yield Predictions
Fudong Lin, Kaleb Guillot, Summer Crawford, Yihe Zhang, Xu Yuan, Nian-Feng Tzeng
Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale Dataset
Shijie Lian, Ziyi Zhang, Hua Li, Wenjie Li, Laurence Tianruo Yang, Sam Kwong, Runmin Cong