Data Set
Datasets are crucial for training and evaluating machine learning models, particularly in areas like natural language processing, computer vision, and audio analysis. Current research emphasizes creating diverse and high-quality datasets addressing specific challenges, such as data imbalance, cross-lingual inconsistencies, and the need for realistic representations of real-world scenarios. This involves developing novel annotation techniques, incorporating multiple data modalities (e.g., text, images, audio), and employing various model architectures (e.g., transformers, convolutional neural networks) for analysis and benchmark creation. The availability of well-designed datasets directly impacts the development of robust and reliable machine learning models, ultimately advancing scientific understanding and improving practical applications across numerous fields.
Papers
Larth: Dataset and Machine Translation for Etruscan
Gianluca Vico, Gerasimos Spanakis
InterroLang: Exploring NLP Models and Datasets through Dialogue-based Explanations
Nils Feldhus, Qianli Wang, Tatiana Anikina, Sahil Chopra, Cennet Oguz, Sebastian Möller
Divide and Ensemble: Progressively Learning for the Unknown
Hu Zhang, Xin Shen, Heming Du, Huiqiang Chen, Chen Liu, Hongwei Sheng, Qingzheng Xu, MD Wahiduzzaman Khan, Qingtao Yu, Tianqing Zhu, Scott Chapman, Zi Huang, Xin Yu
MEMO: Dataset and Methods for Robust Multimodal Retinal Image Registration with Large or Small Vessel Density Differences
Chiao-Yi Wang, Faranguisse Kakhi Sadrieh, Yi-Ting Shen, Shih-En Chen, Sarah Kim, Victoria Chen, Achyut Raghavendra, Dongyi Wang, Osamah Saeedi, Yang Tao
Species196: A One-Million Semi-supervised Dataset for Fine-grained Species Recognition
Wei He, Kai Han, Ying Nie, Chengcheng Wang, Yunhe Wang