Data Set
Datasets are crucial for training and evaluating machine learning models, particularly in areas like natural language processing, computer vision, and audio analysis. Current research emphasizes creating diverse and high-quality datasets addressing specific challenges, such as data imbalance, cross-lingual inconsistencies, and the need for realistic representations of real-world scenarios. This involves developing novel annotation techniques, incorporating multiple data modalities (e.g., text, images, audio), and employing various model architectures (e.g., transformers, convolutional neural networks) for analysis and benchmark creation. The availability of well-designed datasets directly impacts the development of robust and reliable machine learning models, ultimately advancing scientific understanding and improving practical applications across numerous fields.
Papers
Fashionpedia-Taste: A Dataset towards Explaining Human Fashion Taste
Mengyun Shi, Serge Belongie, Claire Cardie
A Survey on Dataset Distillation: Approaches, Applications and Future Directions
Jiahui Geng, Zongxiong Chen, Yuandou Wang, Herbert Woisetschlaeger, Sonja Schimmler, Ruben Mayer, Zhiming Zhao, Chunming Rong
NorQuAD: Norwegian Question Answering Dataset
Sardana Ivanova, Fredrik Aas Andreassen, Matias Jentoft, Sondre Wold, Lilja Øvrelid
Fine Tuning with Abnormal Examples
Will Rieger
HeySQuAD: A Spoken Question Answering Dataset
Yijing Wu, SaiKrishna Rallabandi, Ravisutha Srinivasamurthy, Parag Pravin Dakle, Alolika Gon, Preethi Raghavan
DiffuseExpand: Expanding dataset for 2D medical image segmentation using diffusion models
Shitong Shao, Xiaohan Yuan, Zhen Huang, Ziming Qiu, Shuai Wang, Kevin Zhou
LoRaWAN-enabled Smart Campus: The Dataset and a People Counter Use Case
Eslam Eldeeb, Hirley Alves
ZRG: A Dataset for Multimodal 3D Residential Rooftop Understanding
Isaac Corley, Jonathan Lwowski, Peyman Najafirad
Introducing MBIB -- the first Media Bias Identification Benchmark Task and Dataset Collection
Martin Wessel, Tomáš Horych, Terry Ruas, Akiko Aizawa, Bela Gipp, Timo Spinde
Methods and datasets for segmentation of minimally invasive surgical instruments in endoscopic images and videos: A review of the state of the art
Tobias Rueckert, Daniel Rueckert, Christoph Palm
ReLight My NeRF: A Dataset for Novel View Synthesis and Relighting of Real World Objects
Marco Toschi, Riccardo De Matteo, Riccardo Spezialetti, Daniele De Gregorio, Luigi Di Stefano, Samuele Salti
NTIRE 2023 Challenge on Light Field Image Super-Resolution: Dataset, Methods and Results
Yingqian Wang, Longguang Wang, Zhengyu Liang, Jungang Yang, Radu Timofte, Yulan Guo