Data Set
Datasets are crucial for training and evaluating machine learning models, particularly in areas like natural language processing, computer vision, and audio analysis. Current research emphasizes creating diverse and high-quality datasets addressing specific challenges, such as data imbalance, cross-lingual inconsistencies, and the need for realistic representations of real-world scenarios. This involves developing novel annotation techniques, incorporating multiple data modalities (e.g., text, images, audio), and employing various model architectures (e.g., transformers, convolutional neural networks) for analysis and benchmark creation. The availability of well-designed datasets directly impacts the development of robust and reliable machine learning models, ultimately advancing scientific understanding and improving practical applications across numerous fields.
Papers
Multilingual Text Style Transfer: Datasets & Models for Indian Languages
Sourabrata Mukherjee, Atul Kr. Ojha, Akanksha Bansal, Deepak Alok, John P. McCrae, Ondřej Dušek
Image captioning in different languages
Emiel van Miltenburg
FinGen: A Dataset for Argument Generation in Finance
Chung-Chi Chen, Hiroya Takamura, Ichiro Kobayashi, Yusuke Miyao
Estimating Human Poses Across Datasets: A Unified Skeleton and Multi-Teacher Distillation Approach
Muhammad Saif Ullah Khan, Dhavalkumar Limbachiya, Didier Stricker, Muhammad Zeshan Afzal
LED: A Large-scale Real-world Paired Dataset for Event Camera Denoising
Yuxing Duan, Shihan Peng, Lin Zhu, Wei Zhang, Yi Chang, Sheng Zhong, Luxin Yan
EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery Videos
Ryo Fujii, Masashi Hatano, Hideo Saito, Hiroki Kajita
The RSNA Abdominal Traumatic Injury CT (RATIC) Dataset
Jeffrey D. Rudie, Hui-Ming Lin, Robyn L. Ball, Sabeena Jalal, Luciano M. Prevedello, Savvas Nicolaou, Brett S. Marinelli, Adam E. Flanders, Kirti Magudia, George Shih, Melissa A. Davis, John Mongan, Peter D. Chang, Ferco H. Berger, Sebastiaan Hermans, Meng Law, Tyler Richards, Jan-Peter Grunz, Andreas Steven Kunz, Shobhit Mathur, Sandro Galea-Soler, Andrew D. Chung, Saif Afat, Chin-Chi Kuo, Layal Aweidah, Ana Villanueva Campos, Arjuna Somasundaram, Felipe Antonio Sanchez Tijmes, Attaporn Jantarangkoon, Leonardo Kayat Bittencourt, Michael Brassil, Ayoub El Hajjami, Hakan Dogan, Muris Becircic, Agrahara G. Bharatkumar, Eduardo Moreno Júdice de Mattos Farina, Dataset Curator Group, Dataset Contributor Group, Dataset Annotator Group, Errol Colak
Biclustering a dataset using photonic quantum computing
Ajinkya Borle, Ameya Bhave
PRFashion24: A Dataset for Sentiment Analysis of Fashion Products Reviews in Persian
Mehrimah Amirpour, Reza Azmi
Recent Trends in Personalized Dialogue Generation: A Review of Datasets, Methodologies, and Evaluations
Yi-Pei Chen, Noriki Nishida, Hideki Nakayama, Yuji Matsumoto
What is a Goldilocks Face Verification Test Set?
Haiyu Wu, Sicong Tian, Aman Bhatta, Jacob Gutierrez, Grace Bezold, Genesis Argueta, Karl Ricanek, Michael C. King, Kevin W. Bowyer
Planted: a dataset for planted forest identification from multi-satellite time series
Luis Miguel Pazos-Outón, Cristina Nader Vasconcelos, Anton Raichuk, Anurag Arnab, Dan Morris, Maxim Neumann
Leveraging knowledge distillation for partial multi-task learning from multiple remote sensing datasets
Hoàng-Ân Lê, Minh-Tan Pham
Privacy-preserving recommender system using the data collaboration analysis for distributed datasets
Tomoya Yanagi, Shunnosuke Ikeda, Noriyoshi Sukegawa, Yuichi Takano
A Dataset for Research on Water Sustainability
Pranjol Sen Gupta, Md Rajib Hossen, Pengfei Li, Shaolei Ren, Mohammad A. Islam