Global Evaluation
Global evaluation in various scientific domains focuses on developing robust and reliable methods for assessing the performance of models and systems, often addressing challenges in data diversity, evolving data distributions, and the need for human-centered metrics. Current research emphasizes the development of comprehensive benchmarks and evaluation frameworks, often incorporating techniques like Item Response Theory and multi-faceted metrics beyond simple accuracy, and utilizing diverse model architectures including Large Language Models (LLMs), Convolutional Neural Networks (CNNs), and Graph Neural Networks (GNNs). These advancements are crucial for ensuring the trustworthiness and effectiveness of AI systems across diverse applications, from medical diagnosis to autonomous driving, and for fostering reproducible and comparable research within the scientific community.
Papers
Evaluation of Few-Shot Learning for Classification Tasks in the Polish Language
Tsimur Hadeliya, Dariusz Kajtoch
A Comparison of Differential Performance Metrics for the Evaluation of Automatic Speaker Verification Fairness
Oubaida Chouchane, Christoph Busch, Chiara Galdi, Nicholas Evans, Massimiliano Todisco
"ChatGPT Is Here to Help, Not to Replace Anybody" -- An Evaluation of Students' Opinions On Integrating ChatGPT In CS Courses
Bruno Pereira Cipriano, Pedro Alves
Evaluation of Geographical Distortions in Language Models: A Crucial Step Towards Equitable Representations
Rémy Decoupes, Roberto Interdonato, Mathieu Roche, Maguelonne Teisseire, Sarah Valentin
Evaluations of Machine Learning Privacy Defenses are Misleading
Michael Aerni, Jie Zhang, Florian Tramèr
Misaka: Interactive Swarm Testbed for Smart Grid Distributed Algorithm Test and Evaluation
Tingliang Zhang, Haiwang Zhong, Zhenfei Tan, Xinfei Yan
Evaluation of Teleoperation Concepts to solve Automated Vehicle Disengagements
David Brecht, Nils Gehrke, Tobias Kerbl, Niklas Krauss, Domagoj Majstorovic, Florian Pfab, Maria-Magdalena Wolf, Frank Diermeyer
A Customer Level Fraudulent Activity Detection Benchmark for Enhancing Machine Learning Model Research and Evaluation
Phoebe Jing, Yijing Gao, Xianlong Zeng