Global Evaluation
Global evaluation in various scientific domains focuses on developing robust and reliable methods for assessing the performance of models and systems, often addressing challenges in data diversity, evolving data distributions, and the need for human-centered metrics. Current research emphasizes the development of comprehensive benchmarks and evaluation frameworks, often incorporating techniques like Item Response Theory and multi-faceted metrics beyond simple accuracy, and utilizing diverse model architectures including Large Language Models (LLMs), Convolutional Neural Networks (CNNs), and Graph Neural Networks (GNNs). These advancements are crucial for ensuring the trustworthiness and effectiveness of AI systems across diverse applications, from medical diagnosis to autonomous driving, and for fostering reproducible and comparable research within the scientific community.
Papers
Medical Dialogue: A Survey of Categories, Methods, Evaluation and Challenges
Xiaoming Shi, Zeming Liu, Li Du, Yuxuan Wang, Hongru Wang, Yuhang Guo, Tong Ruan, Jie Xu, Shaoting Zhang
Enhancing the analysis of murine neonatal ultrasonic vocalizations: Development, evaluation, and application of different mathematical models
Rudolf Herdt, Louisa Kinzel, Johann Georg Maaß, Marvin Walther, Henning Fröhlich, Tim Schubert, Peter Maass, Christian Patrick Schaaf
Guidelines for evaluation of complex multi agent test scenarios
Ana Isabel Garcia Guerra, Teng Sung Shiuan
Reproducing the Metric-Based Evaluation of a Set of Controllable Text Generation Techniques
Michela Lorandi, Anya Belz
Squeezing Lemons with Hammers: An Evaluation of AutoML and Tabular Deep Learning for Data-Scarce Classification Applications
Ricardo Knauer, Erik Rodner
Evaluation of Retrieval-Augmented Generation: A Survey
Hao Yu, Aoran Gan, Kai Zhang, Shiwei Tong, Qi Liu, Zhaofeng Liu
Towards Geographic Inclusion in the Evaluation of Text-to-Image Models
Melissa Hall, Samuel J. Bell, Candace Ross, Adina Williams, Michal Drozdzal, Adriana Romero Soriano
Audio-Visual Speech Representation Expert for Enhanced Talking Face Video Generation and Evaluation
Dogucan Yaman, Fevziye Irem Eyiokur, Leonard Bärmann, Seymanur Aktı, Hazım Kemal Ekenel, Alexander Waibel
A General Model for Detecting Learner Engagement: Implementation and Evaluation
Somayeh Malekshahi, Javad M. Kheyridoost, Omid Fatemi
Evaluation of Video-Based rPPG in Challenging Environments: Artifact Mitigation and Network Resilience
Nhi Nguyen, Le Nguyen, Honghan Li, Miguel Bordallo López, Constantino Álvarez Casado
On the Evaluation of Machine-Generated Reports
James Mayfield, Eugene Yang, Dawn Lawrie, Sean MacAvaney, Paul McNamee, Douglas W. Oard, Luca Soldaini, Ian Soboroff, Orion Weller, Efsun Kayi, Kate Sanders, Marc Mason, Noah Hibbler
Towards Scenario- and Capability-Driven Dataset Development and Evaluation: An Approach in the Context of Mapless Automated Driving
Felix Grün, Marcus Nolte, Markus Maurer
Revisiting Reward Design and Evaluation for Robust Humanoid Standing and Walking
Bart van Marum, Aayam Shrestha, Helei Duan, Pranay Dugar, Jeremy Dao, Alan Fern