Global Evaluation
Global evaluation in various scientific domains focuses on developing robust and reliable methods for assessing the performance of models and systems, often addressing challenges in data diversity, evolving data distributions, and the need for human-centered metrics. Current research emphasizes the development of comprehensive benchmarks and evaluation frameworks, often incorporating techniques like Item Response Theory and multi-faceted metrics beyond simple accuracy, and utilizing diverse model architectures including Large Language Models (LLMs), Convolutional Neural Networks (CNNs), and Graph Neural Networks (GNNs). These advancements are crucial for ensuring the trustworthiness and effectiveness of AI systems across diverse applications, from medical diagnosis to autonomous driving, and for fostering reproducible and comparable research within the scientific community.
Papers
Can we trust the evaluation on ChatGPT?
Rachith Aiyappa, Jisun An, Haewoon Kwak, Yong-Yeol Ahn
Evaluation of Sketch-Based and Semantic-Based Modalities for Mockup Generation
Tommaso Calò, Luigi De Russis
Autonomous Robotic Drilling System for Mice Cranial Window Creation: An Evaluation with an Egg Model
Enduo Zhao, Murilo M. Marinho, Kanako Harada
Importance of Aligning Training Strategy with Evaluation for Diffusion Models in 3D Multiclass Segmentation
Yunguan Fu, Yiwen Li, Shaheer U. Saeed, Matthew J. Clarkson, Yipeng Hu
Creation and evaluation of timelines for longitudinal user posts
Anthony Hills, Adam Tsakalidis, Federico Nanni, Ioannis Zachos, Maria Liakata
EVOLIN Benchmark: Evaluation of Line Detection and Association
Kirill Ivanov, Gonzalo Ferrer, Anastasiia Kornilova
Analysis and Evaluation of Explainable Artificial Intelligence on Suicide Risk Assessment
Hao Tang, Aref Miri Rekavandi, Dharjinder Rooprai, Girish Dwivedi, Frank Sanfilippo, Farid Boussaid, Mohammed Bennamoun