Benchmark Platform
Benchmark platforms in various scientific domains aim to provide standardized evaluations of models and algorithms, enabling fair comparisons and driving research progress. Current research focuses on developing comprehensive benchmarks across diverse areas, including natural language processing, computer vision, robotics, and healthcare, often incorporating novel model architectures like large language models and deep learning frameworks. These platforms are crucial for advancing the field by facilitating reproducible research, identifying limitations of existing methods, and ultimately leading to more robust and reliable systems with real-world applications. The resulting insights inform the development of improved algorithms and contribute to a more rigorous and transparent scientific process.
Papers
Benchmarking Embedding Aggregation Methods in Computational Pathology: A Clinical Data Perspective
Shengjia Chen, Gabriele Campanella, Abdulkadir Elmas, Aryeh Stock, Jennifer Zeng, Alexandros D. Polydorides, Adam J. Schoenfeld, Kuan-lin Huang, Jane Houldsworth, Chad Vanderbilt, Thomas J. Fuchs
Beyond Benchmarking: A New Paradigm for Evaluation and Assessment of Large Language Models
Jin Liu, Qingquan Li, Wenlong Du
Position: Benchmarking is Limited in Reinforcement Learning Research
Scott M. Jordan, Adam White, Bruno Castro da Silva, Martha White, Philip S. Thomas
Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking
Yuwei Zhang, Tong Xia, Jing Han, Yu Wu, Georgios Rizos, Yang Liu, Mohammed Mosuily, Jagmohan Chauhan, Cecilia Mascolo
NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking
Daniel Dauner, Marcel Hallgarten, Tianyu Li, Xinshuo Weng, Zhiyu Huang, Zetong Yang, Hongyang Li, Igor Gilitschenski, Boris Ivanovic, Marco Pavone, Andreas Geiger, Kashyap Chitta
FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents
Ruixuan Xiao, Wentao Ma, Ke Wang, Yuchuan Wu, Junbo Zhao, Haobo Wang, Fei Huang, Yongbin Li