Global Evaluation
Global evaluation in various scientific domains focuses on developing robust and reliable methods for assessing the performance of models and systems, often addressing challenges in data diversity, evolving data distributions, and the need for human-centered metrics. Current research emphasizes the development of comprehensive benchmarks and evaluation frameworks, often incorporating techniques like Item Response Theory and multi-faceted metrics beyond simple accuracy, and utilizing diverse model architectures including Large Language Models (LLMs), Convolutional Neural Networks (CNNs), and Graph Neural Networks (GNNs). These advancements are crucial for ensuring the trustworthiness and effectiveness of AI systems across diverse applications, from medical diagnosis to autonomous driving, and for fostering reproducible and comparable research within the scientific community.
Papers
How I learned to stop worrying and love the curse of dimensionality: an appraisal of cluster validation in high-dimensional spaces
Brian A. Powell
Fish sounds: towards the evaluation of marine acoustic biodiversity through data-driven audio source separation
Michele Mancusi, Nicola Zonca, Emanuele Rodolà, Silvia Zuffi
Evaluation of Four Black-box Adversarial Attacks and Some Query-efficient Improvement Analysis
Rui Wang
FIFA ranking: Evaluation and path forward
Leszek Szczecinski, Iris-Ioana Roatis
Latte: Cross-framework Python Package for Evaluation of Latent-Based Generative Models
Karn N. Watcharasupat, Junyoung Lee, Alexander Lerch
Evaluation and Comparison of Deep Learning Methods for Pavement Crack Identification with Visual Images
Kai-Liang Lu