Data Set
Datasets are crucial for training and evaluating machine learning models, particularly in areas like natural language processing, computer vision, and audio analysis. Current research emphasizes creating diverse and high-quality datasets addressing specific challenges, such as data imbalance, cross-lingual inconsistencies, and the need for realistic representations of real-world scenarios. This involves developing novel annotation techniques, incorporating multiple data modalities (e.g., text, images, audio), and employing various model architectures (e.g., transformers, convolutional neural networks) for analysis and benchmark creation. The availability of well-designed datasets directly impacts the development of robust and reliable machine learning models, ultimately advancing scientific understanding and improving practical applications across numerous fields.
Papers
HERDPhobia: A Dataset for Hate Speech against Fulani in Nigeria
Saminu Mohammad Aliyu, Gregory Maksha Wajiga, Muhammad Murtala, Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Ibrahim Said Ahmad
PlasmoID: A dataset for Indonesian malaria parasite detection and segmentation in thin blood smear
Hanung Adi Nugroho, Rizki Nurfauzi, E. Elsa Herdiana Murhandarwati, Purwono Purwono
The NCTE Transcripts: A Dataset of Elementary Math Classroom Transcripts
Dorottya Demszky, Heather Hill
A Dataset for Greek Traditional and Folk Music: Lyra
Charilaos Papaioannou, Ioannis Valiantzas, Theodoros Giannakopoulos, Maximos Kaliakatsos-Papakostas, Alexandros Potamianos
Rooms with Text: A Dataset for Overlaying Text Detection
Oleg Smirnov, Aditya Tewari
Video Background Music Generation: Dataset, Method and Evaluation
Le Zhuo, Zhaokai Wang, Baisen Wang, Yue Liao, Chenxi Bao, Stanley Peng, Songhao Han, Aixi Zhang, Fei Fang, Si Liu
Revisiting Color-Event based Tracking: A Unified Network, Dataset, and Metric
Chuanming Tang, Xiao Wang, Ju Huang, Bo Jiang, Lin Zhu, Jianlin Zhang, Yaowei Wang, Yonghong Tian
F2SD: A dataset for end-to-end group detection algorithms
Giang Hoang, Tuan Nguyen Dinh, Tung Cao Hoang, Son Le Duy, Keisuke Hihara, Yumeka Utada, Akihiko Torii, Naoki Izumi, Long Tran Quoc
aiMotive Dataset: A Multimodal Dataset for Robust Autonomous Driving with Long-Range Perception
Tamás Matuszka, Iván Barton, Ádám Butykai, Péter Hajas, Dávid Kiss, Domonkos Kovács, Sándor Kunsági-Máté, Péter Lengyel, Gábor Németh, Levente Pető, Dezső Ribli, Dávid Szeghy, Szabolcs Vajna, Bálint Varga
ComMU: Dataset for Combinatorial Music Generation
Lee Hyun, Taehyun Kim, Hyolim Kang, Minjoo Ki, Hyeonchan Hwang, Kwanho Park, Sharang Han, Seon Joo Kim