Benchmark Study
Benchmark studies systematically evaluate the performance of machine learning models and algorithms across diverse datasets and tasks, aiming to identify strengths, weaknesses, and areas for improvement. Current research focuses on developing standardized benchmarks for various domains, including natural language processing, computer vision, and time series analysis, often incorporating rigorous evaluation metrics and addressing issues like reproducibility and uncertainty quantification. These studies are crucial for advancing the field by providing objective comparisons, identifying limitations of existing methods, and guiding the development of more robust and effective models with practical applications in various sectors.
Papers
Evaluating the Evolution of YOLO (You Only Look Once) Models: A Comprehensive Benchmark Study of YOLO11 and Its Predecessors
Nidhal Jegham, Chan Young Koh, Marwan Abdelatti, Abdeltawab Hendawi
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents
Yifan Xu, Xiao Liu, Xueqiao Sun, Siyi Cheng, Hao Yu, Hanyu Lai, Shudan Zhang, Dan Zhang, Jie Tang, Yuxiao Dong
Quantum Kernel Methods under Scrutiny: A Benchmarking Study
Jan Schnabel, Marco Roth
Reassessing the Validity of Spurious Correlations Benchmarks
Samuel J. Bell, Diane Bouchacourt, Levent Sagun
Confidential Computing on NVIDIA Hopper GPUs: A Performance Benchmark Study
Jianwei Zhu, Hang Yin, Peng Deng, Aline Almeida, Shunfan Zhou