Benchmark Platform
Benchmark platforms in various scientific domains aim to provide standardized evaluations of models and algorithms, enabling fair comparisons and driving research progress. Current research focuses on developing comprehensive benchmarks across diverse areas, including natural language processing, computer vision, robotics, and healthcare, often incorporating novel model architectures like large language models and deep learning frameworks. These platforms are crucial for advancing the field by facilitating reproducible research, identifying limitations of existing methods, and ultimately leading to more robust and reliable systems with real-world applications. The resulting insights inform the development of improved algorithms and contribute to a more rigorous and transparent scientific process.
Papers
Pedestrian Trajectory Prediction with Missing Data: Datasets, Imputation, and Benchmarking
Pranav Singh Chib, Pravendra Singh
Benchmark Data Repositories for Better Benchmarking
Rachel Longjohn, Markelle Kelly, Sameer Singh, Padhraic Smyth
DetectRL: Benchmarking LLM-Generated Text Detection in Real-World Scenarios
Junchao Wu, Runzhe Zhan, Derek F. Wong, Shu Yang, Xinyi Yang, Yulin Yuan, Lidia S. Chao
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation
Yiruo Cheng, Kelong Mao, Ziliang Zhao, Guanting Dong, Hongjin Qian, Yongkang Wu, Tetsuya Sakai, Ji-Rong Wen, Zhicheng Dou
VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning
Jingkun Ma, Runzhe Zhan, Derek F. Wong, Yang Li, Di Sun, Hou Pong Chan, Lidia S. Chao
InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models
Hao Li, Xiaogeng Liu, Chaowei Xiao
Analysis and Benchmarking of Extending Blind Face Image Restoration to Videos
Zhouxia Wang, Jiawei Zhang, Xintao Wang, Tianshui Chen, Ying Shan, Wenping Wang, Ping Luo
Leaving the barn door open for Clever Hans: Simple features predict LLM benchmark answers
Lorenzo Pacchiardi, Marko Tesic, Lucy G. Cheke, José Hernández-Orallo
EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment
Chen Gao, Baining Zhao, Weichen Zhang, Jinzhu Mao, Jun Zhang, Zhiheng Zheng, Fanhang Man, Jianjie Fang, Zile Zhou, Jinqiang Cui, Xinlei Chen, Yong Li
LexSumm and LexT5: Benchmarking and Modeling Legal Summarization Tasks in English
T.Y.S.S. Santosh, Cornelius Weiss, Matthias Grabmair