Annotated Dataset
Annotated datasets are collections of data points labeled with specific information, crucial for training and evaluating machine learning models, particularly in complex domains like medicine and robotics. Current research emphasizes creating high-quality annotations, often incorporating AI-assisted methods to reduce manual effort, and addressing challenges like noisy or partially annotated data through techniques such as active learning, multi-task learning, and self-supervised learning. These datasets are vital for advancing various fields, enabling the development of more accurate and robust models for applications ranging from medical image analysis and natural language processing to robotics and e-commerce.
Papers
Med-EASi: Finely Annotated Dataset and Models for Controllable Simplification of Medical Texts
Chandrayee Basu, Rosni Vasu, Michihiro Yasunaga, Qian Yang
AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages
Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Abinew Ali Ayele, Nedjma Ousidhoum, David Ifeoluwa Adelani, Seid Muhie Yimam, Ibrahim Sa'id Ahmad, Meriem Beloucif, Saif M. Mohammad, Sebastian Ruder, Oumaima Hourrane, Pavel Brazdil, Felermino Dário Mário António Ali, Davis David, Salomey Osei, Bello Shehu Bello, Falalu Ibrahim, Tajuddeen Gwadabe, Samuel Rutunda, Tadesse Belay, Wendimu Baye Messelle, Hailu Beshada Balcha, Sisay Adugna Chala, Hagos Tesfahun Gebremichael, Bernard Opoku, Steven Arthur
StatMix: Data augmentation method that relies on image statistics in federated learning
Dominik Lewy, Jacek Mańdziuk, Maria Ganzha, Marcin Paprzycki
ASL-Homework-RGBD Dataset: An annotated dataset of 45 fluent and non-fluent signers performing American Sign Language homeworks
Saad Hassan, Matthew Seita, Larwan Berke, Yingli Tian, Elaine Gale, Sooyeon Lee, Matt Huenerfauth