Global Evaluation
Global evaluation in various scientific domains focuses on developing robust and reliable methods for assessing the performance of models and systems, often addressing challenges in data diversity, evolving data distributions, and the need for human-centered metrics. Current research emphasizes the development of comprehensive benchmarks and evaluation frameworks, often incorporating techniques like Item Response Theory and multi-faceted metrics beyond simple accuracy, and utilizing diverse model architectures including Large Language Models (LLMs), Convolutional Neural Networks (CNNs), and Graph Neural Networks (GNNs). These advancements are crucial for ensuring the trustworthiness and effectiveness of AI systems across diverse applications, from medical diagnosis to autonomous driving, and for fostering reproducible and comparable research within the scientific community.
Papers
Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought Method
Yiming Wang, Zhuosheng Zhang, Rui Wang
A study of conceptual language similarity: comparison and evaluation
Haotian Ye, Yihong Liu, Hinrich Schütze
Efficient Large-Scale Visual Representation Learning And Evaluation
Eden Dolev, Alaa Awad, Denisa Roberts, Zahra Ebrahimzadeh, Marcin Mejran, Vaibhav Malpani, Mahir Yavuz
Rethinking the Evaluation for Conversational Recommendation in the Era of Large Language Models
Xiaolei Wang, Xinyu Tang, Wayne Xin Zhao, Jingyuan Wang, Ji-Rong Wen
About Evaluation of F1 Score for RECENT Relation Extraction System
Michał Olek
Evaluation of self-supervised pre-training for automatic infant movement classification using wearable movement sensors
Einari Vaaras, Manu Airaksinen, Sampsa Vanhatalo, Okko Räsänen
Consumer-side Fairness in Recommender Systems: A Systematic Survey of Methods and Evaluation
Bjørnar Vassøy, Helge Langseth
Document Understanding Dataset and Evaluation (DUDE)
Jordy Van Landeghem, Rubén Tito, Łukasz Borchmann, Michał Pietruszka, Paweł Józiak, Rafał Powalski, Dawid Jurkiewicz, Mickaël Coustaty, Bertrand Ackaert, Ernest Valveny, Matthew Blaschko, Sien Moens, Tomasz Stanisławek
EMBRACE: Evaluation and Modifications for Boosting RACE
Mariia Zyrianova, Dmytro Kalpakchi, Johan Boye
Design, Development, and Evaluation of an Interactive Personalized Social Robot to Monitor and Coach Post-Stroke Rehabilitation Exercises
Min Hun Lee, Daniel P. Siewiorek, Asim Smailagic, Alexandre Bernardino, Sergi Bermúdez i Badia
Synthetic data generation for a longitudinal cohort study -- Evaluation, method extension and reproduction of published data analysis results
Lisa Kühnel, Julian Schneider, Ines Perrar, Tim Adams, Fabian Prasser, Ute Nöthlings, Holger Fröhlich, Juliane Fluck