Data Set
Datasets are crucial for training and evaluating machine learning models, particularly in areas like natural language processing, computer vision, and audio analysis. Current research emphasizes creating diverse and high-quality datasets addressing specific challenges, such as data imbalance, cross-lingual inconsistencies, and the need for realistic representations of real-world scenarios. This involves developing novel annotation techniques, incorporating multiple data modalities (e.g., text, images, audio), and employing various model architectures (e.g., transformers, convolutional neural networks) for analysis and benchmark creation. The availability of well-designed datasets directly impacts the development of robust and reliable machine learning models, ultimately advancing scientific understanding and improving practical applications across numerous fields.
Papers
Amelia: A Large Model and Dataset for Airport Surface Movement Forecasting
Ingrid Navarro, Pablo Ortega-Kral, Jay Patrikar, Haichuan Wang, Zelin Ye, Jong Hoon Park, Jean Oh, Sebastian Scherer
dopanim: A Dataset of Doppelganger Animals with Noisy Annotations from Multiple Humans
Marek Herde, Denis Huseljic, Lukas Rauch, Bernhard Sick
A Dataset for Crucial Object Recognition in Blind and Low-Vision Individuals' Navigation
Md Touhidul Islam, Imran Kabir, Elena Ariel Pearce, Md Alimoor Reza, Syed Masum Billah
AbdomenAtlas: A Large-Scale, Detailed-Annotated, & Multi-Center Dataset for Efficient Transfer Learning and Open Algorithmic Benchmarking
Wenxuan Li, Chongyu Qu, Xiaoxi Chen, Pedro R. A. S. Bassi, Yijia Shi, Yuxiang Lai, Qian Yu, Huimin Xue, Yixiong Chen, Xiaorui Lin, Yutong Tang, Yining Cao, Haoqi Han, Zheyuan Zhang, Jiawei Liu, Tiezheng Zhang, Yujiu Ma, Jincheng Wang, Guang Zhang, Alan Yuille, Zongwei Zhou
MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity
Yangzhou Liu, Yue Cao, Zhangwei Gao, Weiyun Wang, Zhe Chen, Wenhai Wang, Hao Tian, Lewei Lu, Xizhou Zhu, Tong Lu, Yu Qiao, Jifeng Dai
OMoS-QA: A Dataset for Cross-Lingual Extractive Question Answering in a German Migration Context
Steffen Kleinle, Jakob Prange, Annemarie Friedrich
A Tale of Single-channel Electroencephalogram: Devices, Datasets, Signal Processing, Applications, and Future Directions
Yueyang Li, Weiming Zeng, Wenhao Dong, Di Han, Lei Chen, Hongyu Chen, Hongjie Yan, Wai Ting Siok, Nizhuan Wang
Economy Watchers Survey provides Datasets and Tasks for Japanese Financial Domain
Masahiro Suzuki, Hiroki Sakaji
Indoor Air Quality Dataset with Activities of Daily Living in Low to Middle-income Communities
Prasenjit Karmakar, Swadhin Pradhan, Sandip Chakraborty
Thinking Racial Bias in Fair Forgery Detection: Models, Datasets and Evaluations
Decheng Liu, Zongqi Wang, Chunlei Peng, Nannan Wang, Ruimin Hu, Xinbo Gao
Automatic Classification of News Subjects in Broadcast News: Application to a Gender Bias Representation Analysis
Valentin Pelloin, Lena Dodson, Émile Chapuis, Nicolas Hervé, David Doukhan
360VFI: A Dataset and Benchmark for Omnidirectional Video Frame Interpolation
Wenxuan Lu, Mengshun Hu, Yansheng Qiu, Liang Liao, Zheng Wang
PLANTS: A Novel Problem and Dataset for Summarization of Planning-Like (PL) Tasks
Vishal Pallagani, Biplav Srivastava, Nitin Gupta
A Dataset and Benchmark for Shape Completion of Fruits for Agricultural Robotics
Federico Magistri, Thomas Läbe, Elias Marks, Sumanth Nagulavancha, Yue Pan, Claus Smitt, Lasse Klingbeil, Michael Halstead, Heiner Kuhlmann, Chris McCool, Jens Behley, Cyrill Stachniss