Benchmark Study
Benchmark studies systematically evaluate the performance of machine learning models and algorithms across diverse datasets and tasks, aiming to identify strengths, weaknesses, and areas for improvement. Current research focuses on developing standardized benchmarks for various domains, including natural language processing, computer vision, and time series analysis, often incorporating rigorous evaluation metrics and addressing issues like reproducibility and uncertainty quantification. These studies are crucial for advancing the field by providing objective comparisons, identifying limitations of existing methods, and guiding the development of more robust and effective models with practical applications in various sectors.
Papers
On Uncertainty Calibration and Selective Generation in Probabilistic Neural Summarization: A Benchmark Study
Polina Zablotskaia, Du Phan, Joshua Maynez, Shashi Narayan, Jie Ren, Jeremiah Liu
OOD-CV-v2: An extended Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images
Bingchen Zhao, Jiahao Wang, Wufei Ma, Artur Jesslen, Siwei Yang, Shaozuo Yu, Oliver Zendel, Christian Theobalt, Alan Yuille, Adam Kortylewski