Global Evaluation
Global evaluation in various scientific domains focuses on developing robust and reliable methods for assessing the performance of models and systems, often addressing challenges in data diversity, evolving data distributions, and the need for human-centered metrics. Current research emphasizes the development of comprehensive benchmarks and evaluation frameworks, often incorporating techniques like Item Response Theory and multi-faceted metrics beyond simple accuracy, and utilizing diverse model architectures including Large Language Models (LLMs), Convolutional Neural Networks (CNNs), and Graph Neural Networks (GNNs). These advancements are crucial for ensuring the trustworthiness and effectiveness of AI systems across diverse applications, from medical diagnosis to autonomous driving, and for fostering reproducible and comparable research within the scientific community.
Papers
A critical look at the evaluation of GNNs under heterophily: Are we really making progress?
Oleg Platonov, Denis Kuznedelev, Michael Diskin, Artem Babenko, Liudmila Prokhorenkova
Evaluation of Extra Pixel Interpolation with Mask Processing for Medical Image Segmentation with Deep Learning
Olivier Rukundo
Real-World Deployment and Evaluation of Kwame for Science, An AI Teaching Assistant for Science Education in West Africa
George Boateng, Samuel John, Samuel Boateng, Philemon Badu, Patrick Agyeman-Budu, Victor Kumbol
Does the evaluation stand up to evaluation? A first-principle approach to the evaluation of classifiers
K. Dyrland, A. S. Lundervold, P. G. L. Porta Mana
FairPy: A Toolkit for Evaluation of Social Biases and their Mitigation in Large Language Models
Hrishikesh Viswanath, Tianyi Zhang
Evaluation of Data Augmentation and Loss Functions in Semantic Image Segmentation for Drilling Tool Wear Detection
Elke Schlager, Andreas Windisch, Lukas Hanna, Thomas Klünsner, Elias Jan Hagendorfer, Tamara Teppernegg