Global Evaluation
Global evaluation in various scientific domains focuses on developing robust and reliable methods for assessing the performance of models and systems, often addressing challenges in data diversity, evolving data distributions, and the need for human-centered metrics. Current research emphasizes the development of comprehensive benchmarks and evaluation frameworks, often incorporating techniques like Item Response Theory and multi-faceted metrics beyond simple accuracy, and utilizing diverse model architectures including Large Language Models (LLMs), Convolutional Neural Networks (CNNs), and Graph Neural Networks (GNNs). These advancements are crucial for ensuring the trustworthiness and effectiveness of AI systems across diverse applications, from medical diagnosis to autonomous driving, and for fostering reproducible and comparable research within the scientific community.
Papers
A Survey on Evaluation of Large Language Models
Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, Wei Ye, Yue Zhang, Yi Chang, Philip S. Yu, Qiang Yang, Xing Xie
PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations
Ruosen Li, Teerth Patel, Xinya Du
Through the Fairness Lens: Experimental Analysis and Evaluation of Entity Matching
Nima Shahbazi, Nikola Danevski, Fatemeh Nargesian, Abolfazl Asudeh, Divesh Srivastava
Validation of the Practicability of Logical Assessment Formula for Evaluations with Inaccurate Ground-Truth Labels: An Application Study on Tumour Segmentation for Breast Cancer
Yongquan Yang, Hong Bu
SMILE: Evaluation and Domain Adaptation for Social Media Language Understanding
Vasilisa Bashlovkina, Riley Matthews, Zhaobin Kuang, Simon Baumgartner, Michael Bendersky
Evaluation of the Benefits of Zero Velocity Update in Decentralized EKF-Based Cooperative Localization Algorithms for GNSS-Denied Multi-Robot Systems
Cagri Kilic, Eduardo Gutierrez, Jason N. Gross
Evaluation of Virtual Acoustic Environments with Different Acoustic Level of Detail
Stefan Fichna, Steven van de Par, Stephan D. Ewert
Evaluation of Environmental Conditions on Object Detection using Oriented Bounding Boxes for AR Applications
Vladislav Li, Barbara Villarini, Jean-Christophe Nebel, Thomas Lagkas, Panagiotis Sarigiannidis, Vasileios Argyriou
Evaluation and Optimization of Rendering Techniques for Autonomous Driving Simulation
Chengyi Wang, Chunji Xu, Peilun Wu
Evaluation of machine learning architectures on the quantification of epistemic and aleatoric uncertainties in complex dynamical systems
Stephen Guth, Alireza Mojahed, Themistoklis P. Sapsis
Evaluation of OpenAI Codex for HPC Parallel Programming Models Kernel Generation
William F. Godoy, Pedro Valero-Lara, Keita Teranishi, Prasanna Balaprakash, Jeffrey S. Vetter