Benchmark Dataset
Benchmark datasets are curated collections of data designed to rigorously evaluate the performance of algorithms and models across various scientific domains. Current research focuses on developing datasets for diverse tasks, including multimodal data analysis (e.g., combining image, text, and audio data), challenging scenarios like low-resource languages or complex biological images, and addressing issues like model hallucinations and bias. These datasets are crucial for fostering objective comparisons, identifying limitations in existing methods, and driving advancements in machine learning and related fields, ultimately leading to more robust and reliable applications in diverse sectors.
Papers
UrbanSARFloods: Sentinel-1 SLC-Based Benchmark Dataset for Urban and Open-Area Flood Mapping
Jie Zhao, Zhitong Xiong, Xiao Xiang Zhu
From Tissue Plane to Organ World: A Benchmark Dataset for Multimodal Biomedical Image Registration using Deep Co-Attention Networks
Yifeng Wang, Weipeng Li, Thomas Pearce, Haohan Wang
WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting
Olly Styles, Sam Miller, Patricio Cerda-Mardini, Tanaya Guha, Victor Sanchez, Bertie Vidgen
New Benchmark Dataset and Fine-Grained Cross-Modal Fusion Framework for Vietnamese Multimodal Aspect-Category Sentiment Analysis
Quy Hoang Nguyen, Minh-Van Truong Nguyen, Kiet Van Nguyen
A Survey on the Real Power of ChatGPT
Ming Liu, Ran Liu, Ye Zhu, Hua Wang, Youyang Qu, Rongsheng Li, Yongpan Sheng, Wray Buntine
A User-Centric Multi-Intent Benchmark for Evaluating Large Language Models
Jiayin Wang, Fengran Mo, Weizhi Ma, Peijie Sun, Min Zhang, Jian-Yun Nie
VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models
Haoyi Qiu, Wenbo Hu, Zi-Yi Dou, Nanyun Peng