Global Evaluation
Global evaluation in various scientific domains focuses on developing robust and reliable methods for assessing the performance of models and systems, often addressing challenges in data diversity, evolving data distributions, and the need for human-centered metrics. Current research emphasizes the development of comprehensive benchmarks and evaluation frameworks, often incorporating techniques like Item Response Theory and multi-faceted metrics beyond simple accuracy, and utilizing diverse model architectures including Large Language Models (LLMs), Convolutional Neural Networks (CNNs), and Graph Neural Networks (GNNs). These advancements are crucial for ensuring the trustworthiness and effectiveness of AI systems across diverse applications, from medical diagnosis to autonomous driving, and for fostering reproducible and comparable research within the scientific community.
Papers
Challenges and Considerations in the Evaluation of Bayesian Causal Discovery
Amir Mohammad Karimi Mamaghan, Panagiotis Tigas, Karl Henrik Johansson, Yarin Gal, Yashas Annadani, Stefan Bauer
Which Side Are You On? A Multi-task Dataset for End-to-End Argument Summarisation and Evaluation
Hao Li, Yuping Wu, Viktor Schlegel, Riza Batista-Navarro, Tharindu Madusanka, Iqra Zahid, Jiayan Zeng, Xiaochi Wang, Xinran He, Yizhi Li, Goran Nenadic
Evaluation of data inconsistency for multi-modal sentiment analysis
Yufei Wang, Mengyue Wu
A Human-Annotated Video Dataset for Training and Evaluation of 360-Degree Video Summarization Methods
Ioannis Kontostathis, Evlampios Apostolidis, Vasileios Mezaris
Evaluation of Multi-task Uncertainties in Joint Semantic Segmentation and Monocular Depth Estimation
Steven Landgraf, Markus Hillemann, Theodor Kapler, Markus Ulrich
Evaluation of Resource-Efficient Crater Detectors on Embedded Systems
Simon Vellas, Bill Psomas, Kalliopi Karadima, Dimitrios Danopoulos, Alexandros Paterakis, George Lentaris, Dimitrios Soudris, Konstantinos Karantzalos
Application based Evaluation of an Efficient Spike-Encoder, "Spiketrum"
MHD Anas Alsakkal, Runze Wang, Jayawan Wijekoon, Huajin Tang
Text Generation: A Systematic Literature Review of Tasks, Evaluation, and Challenges
Jonas Becker, Jan Philip Wahle, Bela Gipp, Terry Ruas
Detection and Positive Reconstruction of Cognitive Distortion sentences: Mandarin Dataset and Evaluation
Shuya Lin, Yuxiong Wang, Jonathan Dong, Shiguang Ni