Global Evaluation
Global evaluation in various scientific domains focuses on developing robust and reliable methods for assessing the performance of models and systems, often addressing challenges in data diversity, evolving data distributions, and the need for human-centered metrics. Current research emphasizes the development of comprehensive benchmarks and evaluation frameworks, often incorporating techniques like Item Response Theory and multi-faceted metrics beyond simple accuracy, and utilizing diverse model architectures including Large Language Models (LLMs), Convolutional Neural Networks (CNNs), and Graph Neural Networks (GNNs). These advancements are crucial for ensuring the trustworthiness and effectiveness of AI systems across diverse applications, from medical diagnosis to autonomous driving, and for fostering reproducible and comparable research within the scientific community.
Papers
About Evaluation of F1 Score for RECENT Relation Extraction System
Michał Olek
Evaluation of self-supervised pre-training for automatic infant movement classification using wearable movement sensors
Einari Vaaras, Manu Airaksinen, Sampsa Vanhatalo, Okko Räsänen
Consumer-side Fairness in Recommender Systems: A Systematic Survey of Methods and Evaluation
Bjørnar Vassøy, Helge Langseth
Document Understanding Dataset and Evaluation (DUDE)
Jordy Van Landeghem, Rubén Tito, Łukasz Borchmann, Michał Pietruszka, Paweł Józiak, Rafał Powalski, Dawid Jurkiewicz, Mickaël Coustaty, Bertrand Ackaert, Ernest Valveny, Matthew Blaschko, Sien Moens, Tomasz Stanisławek
EMBRACE: Evaluation and Modifications for Boosting RACE
Mariia Zyrianova, Dmytro Kalpakchi, Johan Boye
Design, Development, and Evaluation of an Interactive Personalized Social Robot to Monitor and Coach Post-Stroke Rehabilitation Exercises
Min Hun Lee, Daniel P. Siewiorek, Asim Smailagic, Alexandre Bernardino, Sergi Bermúdez i Badia
Synthetic data generation for a longitudinal cohort study -- Evaluation, method extension and reproduction of published data analysis results
Lisa Kühnel, Julian Schneider, Ines Perrar, Tim Adams, Fabian Prasser, Ute Nöthlings, Holger Fröhlich, Juliane Fluck
Evaluation of GPT-3.5 and GPT-4 for supporting real-world information needs in healthcare delivery
Debadutta Dash, Rahul Thapa, Juan M. Banda, Akshay Swaminathan, Morgan Cheatham, Mehr Kashyap, Nikesh Kotecha, Jonathan H. Chen, Saurabh Gombar, Lance Downing, Rachel Pedreira, Ethan Goh, Angel Arnaout, Garret Kenn Morris, Honor Magon, Matthew P Lungren, Eric Horvitz, Nigam H. Shah
Evaluation of Regularization-based Continual Learning Approaches: Application to HAR
Bonpagna Kann, Sandra Castellanos-Paez, Philippe Lalanda