Global Evaluation
Global evaluation in various scientific domains focuses on developing robust and reliable methods for assessing the performance of models and systems, often addressing challenges in data diversity, evolving data distributions, and the need for human-centered metrics. Current research emphasizes the development of comprehensive benchmarks and evaluation frameworks, often incorporating techniques like Item Response Theory and multi-faceted metrics beyond simple accuracy, and utilizing diverse model architectures including Large Language Models (LLMs), Convolutional Neural Networks (CNNs), and Graph Neural Networks (GNNs). These advancements are crucial for ensuring the trustworthiness and effectiveness of AI systems across diverse applications, from medical diagnosis to autonomous driving, and for fostering reproducible and comparable research within the scientific community.
Papers
Reproducing the Metric-Based Evaluation of a Set of Controllable Text Generation Techniques
Michela Lorandi, Anya Belz
Squeezing Lemons with Hammers: An Evaluation of AutoML and Tabular Deep Learning for Data-Scarce Classification Applications
Ricardo Knauer, Erik Rodner
Evaluation of Retrieval-Augmented Generation: A Survey
Hao Yu, Aoran Gan, Kai Zhang, Shiwei Tong, Qi Liu, Zhaofeng Liu
Evaluating Students' Open-ended Written Responses with LLMs: Using the RAG Framework for GPT-3.5, GPT-4, Claude-3, and Mistral-Large
Jussi S. Jauhiainen, Agustín Garagorry Guerra
Towards Efficient Training and Evaluation of Robust Models against $l_0$ Bounded Adversarial Perturbations
Xuyang Zhong, Yixiao Huang, Chen Liu
Towards Geographic Inclusion in the Evaluation of Text-to-Image Models
Melissa Hall, Samuel J. Bell, Candace Ross, Adina Williams, Michal Drozdzal, Adriana Romero Soriano
Audio-Visual Speech Representation Expert for Enhanced Talking Face Video Generation and Evaluation
Dogucan Yaman, Fevziye Irem Eyiokur, Leonard Bärmann, Seymanur Aktı, Hazım Kemal Ekenel, Alexander Waibel
A General Model for Detecting Learner Engagement: Implementation and Evaluation
Somayeh Malekshahi, Javad M. Kheyridoost, Omid Fatemi
Evaluation of Video-Based rPPG in Challenging Environments: Artifact Mitigation and Network Resilience
Nhi Nguyen, Le Nguyen, Honghan Li, Miguel Bordallo López, Constantino Álvarez Casado
On the Evaluation of Machine-Generated Reports
James Mayfield, Eugene Yang, Dawn Lawrie, Sean MacAvaney, Paul McNamee, Douglas W. Oard, Luca Soldaini, Ian Soboroff, Orion Weller, Efsun Kayi, Kate Sanders, Marc Mason, Noah Hibbler
Towards Scenario- and Capability-Driven Dataset Development and Evaluation: An Approach in the Context of Mapless Automated Driving
Felix Grün, Marcus Nolte, Markus Maurer
Revisiting Reward Design and Evaluation for Robust Humanoid Standing and Walking
Bart van Marum, Aayam Shrestha, Helei Duan, Pranay Dugar, Jeremy Dao, Alan Fern
Evaluation of Few-Shot Learning for Classification Tasks in the Polish Language
Tsimur Hadeliya, Dariusz Kajtoch
A Comparison of Differential Performance Metrics for the Evaluation of Automatic Speaker Verification Fairness
Oubaida Chouchane, Christoph Busch, Chiara Galdi, Nicholas Evans, Massimiliano Todisco
"ChatGPT Is Here to Help, Not to Replace Anybody" -- An Evaluation of Students' Opinions On Integrating ChatGPT In CS Courses
Bruno Pereira Cipriano, Pedro Alves
Evaluation of Geographical Distortions in Language Models: A Crucial Step Towards Equitable Representations
Rémy Decoupes, Roberto Interdonato, Mathieu Roche, Maguelonne Teisseire, Sarah Valentin