Model Comparison

Model comparison, the process of evaluating and ranking different machine learning models, is crucial for advancing AI and ensuring reliable applications. Current research emphasizes developing standardized evaluation frameworks and benchmarks that move beyond simple aggregate scores, focusing instead on identifying model strengths and weaknesses across diverse tasks and capabilities, including those involving high-dimensional data and complex model architectures like large language models and Gaussian processes. This rigorous approach is vital for improving model development, fostering transparency, and ultimately leading to more effective and reliable AI systems across various domains.

Papers

January 11, 2023

A prediction and behavioural analysis of machine learning methods for modelling travel mode choice
José Ángel Martín-Baos, Julio Alberto López-Gómez, Luis Rodriguez-Benitez, Tim Hillel, Ricardo García-Ródenas
Machine Learning Human Prediction Model Comparison Travel Mode Behavioural Analysis Mode Choice

October 17, 2022

Systematic Evaluation of Predictive Fairness
Xudong Han, Aili Shen, Trevor Cohn, Timothy Baldwin, Lea Frermann
Multi Class Classification Mitigating Bias Model Comparison Fairness Research

October 13, 2022

Meta-Uncertainty in Bayesian Model Comparison
Marvin Schmitt, Stefan T. Radev, Paul-Christian Bürkner
Markov Chain Monte Carlo Simulation Based Inference Model Comparison Posterior Probability Likelihood Based Uncertainty Information

March 15, 2022

Model Comparison in Approximate Bayesian Computation
Jan Boelts
Computational Neuroscience Posterior Predictive Model Comparison Approximate Bayesian Computation

February 25, 2022

Model Comparison and Calibration Assessment: User Guide for Consistent Scoring Functions in Machine Learning and Actuarial Practice
Tobias Fissler, Christian Lorentzen, Michael Mayer
Machine Learning Model Comparison Scoring Function Target Prediction Actuarial Model

January 19, 2022

Learning-From-Disagreement: A Model Comparison and Visual Analytics Framework
Junpeng Wang, Liang Wang, Yan Zheng, Chin-Chia Michael Yeh, Shubham Jain, Wei Zhang
Simple Classifier Classification Model Binary Classifier Model Comparison Meta Feature Disagreement Analysis Framework Visual Analytics Framework

December 15, 2021

Dynamic Human Evaluation for Relative Model Comparisons
Thórhildur Thorleiksdóttir, Cedric Renggli, Nora Hollenstein, Ce Zhang
Human Annotation Human Evaluation Natural Language Generation Model Comparison Interactive Evaluation Crowdsourcing Study

December 10, 2021

Benchmarking human visual search computational models in natural scenes: models comparison and reference datasets
F. Travi, G. Ruarte, G. Bujia, J. E. Kamienkowski
Natural Image Computational Model Visual Search Model Comparison Reference Dataset

December 2, 2021

How not to Lie with a Benchmark: Rearranging NLP Leaderboards
Shavrina Tatiana, Malykh Valentin
New Benchmark Model Comparison Scoring Method