Data Set
Datasets are crucial for training and evaluating machine learning models, particularly in areas like natural language processing, computer vision, and audio analysis. Current research emphasizes creating diverse and high-quality datasets addressing specific challenges, such as data imbalance, cross-lingual inconsistencies, and the need for realistic representations of real-world scenarios. This involves developing novel annotation techniques, incorporating multiple data modalities (e.g., text, images, audio), and employing various model architectures (e.g., transformers, convolutional neural networks) for analysis and benchmark creation. The availability of well-designed datasets directly impacts the development of robust and reliable machine learning models, ultimately advancing scientific understanding and improving practical applications across numerous fields.
Papers
SSL4EO-L: Datasets and Foundation Models for Landsat Imagery
Adam J. Stewart, Nils Lehmann, Isaac A. Corley, Yi Wang, Yi-Chia Chang, Nassim Ait Ali Braham, Shradha Sehgal, Caleb Robinson, Arindam Banerjee
Datasets and Benchmarks for Offline Safe Reinforcement Learning
Zuxin Liu, Zijian Guo, Haohong Lin, Yihang Yao, Jiacheng Zhu, Zhepeng Cen, Hanjiang Hu, Wenhao Yu, Tingnan Zhang, Jie Tan, Ding Zhao
A Dataset for Deep Learning-based Bone Structure Analyses in Total Hip Arthroplasty
Kaidong Zhang, Ziyang Gan, Dong Liu, Xifu Shang
MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos
Jielin Qiu, Jiacheng Zhu, William Han, Aditesh Kumar, Karthik Mittal, Claire Jin, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Ding Zhao, Bo Li, Lijuan Wang
FinRED: A Dataset for Relation Extraction in Financial Domain
Soumya Sharma, Tapas Nayak, Arusarka Bose, Ajay Kumar Meena, Koustuv Dasgupta, Niloy Ganguly, Pawan Goyal
Financial Numeric Extreme Labelling: A Dataset and Benchmarking for XBRL Tagging
Soumya Sharma, Subhendu Khatuya, Manjunath Hegde, Afreen Shaikh. Koustuv Dasgupta, Pawan Goyal, Niloy Ganguly
NLPositionality: Characterizing Design Biases of Datasets and Models
Sebastin Santy, Jenny T. Liang, Ronan Le Bras, Katharina Reinecke, Maarten Sap
Vital Videos: A dataset of face videos with PPG and blood pressure ground truths
Pieter-Jan Toye
A systematic literature review on the code smells datasets and validation mechanisms
Morteza Zakeri-Nasrabadi, Saeed Parsa, Ehsan Esmaili, Fabio Palomba