Global Evaluation
Global evaluation in various scientific domains focuses on developing robust and reliable methods for assessing the performance of models and systems, often addressing challenges in data diversity, evolving data distributions, and the need for human-centered metrics. Current research emphasizes the development of comprehensive benchmarks and evaluation frameworks, often incorporating techniques like Item Response Theory and multi-faceted metrics beyond simple accuracy, and utilizing diverse model architectures including Large Language Models (LLMs), Convolutional Neural Networks (CNNs), and Graph Neural Networks (GNNs). These advancements are crucial for ensuring the trustworthiness and effectiveness of AI systems across diverse applications, from medical diagnosis to autonomous driving, and for fostering reproducible and comparable research within the scientific community.
Papers
Evaluation of Synthetic Datasets for Conversational Recommender Systems
Harsh Lara, Manoj Tiwari
Evaluation of RGB-D SLAM in Large Indoor Environments
Kirill Muravyev, Konstantin Yakovlev
Evaluation and Improvement of Interpretability for Self-Explainable Part-Prototype Networks
Qihan Huang, Mengqi Xue, Wenqi Huang, Haofei Zhang, Jie Song, Yongcheng Jing, Mingli Song
Shortcomings of Top-Down Randomization-Based Sanity Checks for Evaluations of Deep Neural Network Explanations
Alexander Binder, Leander Weber, Sebastian Lapuschkin, Grégoire Montavon, Klaus-Robert Müller, Wojciech Samek
Ontology-aware Learning and Evaluation for Audio Tagging
Haohe Liu, Qiuqiang Kong, Xubo Liu, Xinhao Mei, Wenwu Wang, Mark D. Plumbley
Evaluation of MPC-based Imitation Learning for Human-like Autonomous Driving
Flavia Sofia Acerbo, Jan Swevers, Tinne Tuytelaars, Tong Duy Son