Comprehensive Benchmark
Comprehensive benchmarks are crucial for evaluating the performance and limitations of various machine learning models, particularly in specialized domains like computer vision, natural language processing, and graph learning. Recent research focuses on developing standardized evaluation frameworks that address issues like inconsistent experimental setups, limited task diversity, and the lack of robust metrics, encompassing diverse model architectures and algorithms. These benchmarks facilitate fair comparisons, identify areas for improvement in existing models, and ultimately accelerate progress in the field by providing a common ground for researchers to evaluate and compare their work. The resulting insights are vital for both advancing fundamental understanding and improving the reliability and trustworthiness of AI systems in real-world applications.
Papers
GAOKAO-Eval: Does high scores truly reflect strong capabilities in LLMs?
Zhikai Lei, Tianyi Liang, Hanglei Hu, Jin Zhang, Yunhua Zhou, Yunfan Shao, Linyang Li, Chenchui Li, Changbo Wang, Hang Yan, Qipeng Guo
FDM-Bench: A Comprehensive Benchmark for Evaluating Large Language Models in Additive Manufacturing Tasks
Ahmadreza Eslaminia, Adrian Jackson, Beitong Tian, Avi Stern, Hallie Gordon, Rajiv Malhotra, Klara Nahrstedt, Chenhui Shao
GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
Pengfei Zhou, Xiaopeng Peng, Jiajun Song, Chuanhao Li, Zhaopan Xu, Yue Yang, Ziyao Guo, Hao Zhang, Yuqi Lin, Yefei He, Lirui Zhao, Shuo Liu, Tianhua Li, Yuxuan Xie, Xiaojun Chang, Yu Qiao, Wenqi Shao, Kaipeng Zhang
Optimizing Multispectral Object Detection: A Bag of Tricks and Comprehensive Benchmarks
Chen Zhou, Peng Cheng, Junfeng Fang, Yifan Zhang, Yibo Yan, Xiaojun Jia, Yanyan Xu, Kun Wang, Xiaochun Cao