Data Set
Datasets are crucial for training and evaluating machine learning models, particularly in areas like natural language processing, computer vision, and audio analysis. Current research emphasizes creating diverse and high-quality datasets addressing specific challenges, such as data imbalance, cross-lingual inconsistencies, and the need for realistic representations of real-world scenarios. This involves developing novel annotation techniques, incorporating multiple data modalities (e.g., text, images, audio), and employing various model architectures (e.g., transformers, convolutional neural networks) for analysis and benchmark creation. The availability of well-designed datasets directly impacts the development of robust and reliable machine learning models, ultimately advancing scientific understanding and improving practical applications across numerous fields.
Papers
JDocQA: Japanese Document Question Answering Dataset for Generative Language Models
Eri Onami, Shuhei Kurita, Taiki Miyanishi, Taro Watanabe
OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion
Xinyu Zhan, Lixin Yang, Yifei Zhao, Kangrui Mao, Hanlin Xu, Zenan Lin, Kailin Li, Cewu Lu
TriviaHG: A Dataset for Automatic Hint Generation from Factoid Questions
Jamshid Mozafari, Anubhav Jangra, Adam Jatowt
A Dataset for Pharmacovigilance in German, French, and Japanese: Annotating Adverse Drug Reactions across Languages
Lisa Raithel, Hui-Syuan Yeh, Shuntaro Yada, Cyril Grouin, Thomas Lavergne, Aurélie Névéol, Patrick Paroubek, Philippe Thomas, Tomohiro Nishiyama, Sebastian Möller, Eiji Aramaki, Yuji Matsumoto, Roland Roller, Pierre Zweigenbaum
For those who don't know (how) to ask: Building a dataset of technology questions for digital newcomers
Evan Lucas, Kelly S. Steelman, Leo C. Ureel, Charles Wallace
QuakeSet: A Dataset and Low-Resource Models to Monitor Earthquakes through Sentinel-1
Daniele Rege Cambrin, Paolo Garza
DORE: A Dataset For Portuguese Definition Generation
Anna Beatriz Dimas Furtado, Tharindu Ranasinghe, Frédéric Blain, Ruslan Mitkov
ChroniclingAmericaQA: A Large-scale Question Answering Dataset based on Historical American Newspaper Pages
Bhawna Piryani, Jamshid Mozafari, Adam Jatowt
SciNews: From Scholarly Complexities to Public Narratives -- A Dataset for Scientific News Report Generation
Dongqi Pu, Yifan Wang, Jia Loy, Vera Demberg
A Gaze-grounded Visual Question Answering Dataset for Clarifying Ambiguous Japanese Questions
Shun Inadumi, Seiya Kawano, Akishige Yuguchi, Yasutomo Kawanishi, Koichiro Yoshino
A Multi-loudspeaker Binaural Room Impulse Response Dataset with High-Resolution Translational and Rotational Head Coordinates in a Listening Room
Yue Qiao, Ryan Miguel Gonzales, Edgar Choueiri
The POLAR Traverse Dataset: A Dataset of Stereo Camera Images Simulating Traverses across Lunar Polar Terrain under Extreme Lighting Conditions
Margaret Hansen, Uland Wong, Terrence Fong