Global Evaluation
Global evaluation in various scientific domains focuses on developing robust and reliable methods for assessing the performance of models and systems, often addressing challenges in data diversity, evolving data distributions, and the need for human-centered metrics. Current research emphasizes the development of comprehensive benchmarks and evaluation frameworks, often incorporating techniques like Item Response Theory and multi-faceted metrics beyond simple accuracy, and utilizing diverse model architectures including Large Language Models (LLMs), Convolutional Neural Networks (CNNs), and Graph Neural Networks (GNNs). These advancements are crucial for ensuring the trustworthiness and effectiveness of AI systems across diverse applications, from medical diagnosis to autonomous driving, and for fostering reproducible and comparable research within the scientific community.
Papers - Page 3
Evaluation of LLMs-based Hidden States as Author Representations for Psychological Human-Centered NLP Tasks
ECLeKTic: a Novel Challenge Set for Evaluation of Cross-Lingual Knowledge Transfer
A database to support the evaluation of gender biases in GPT-4o output
Can LLM Assist in the Evaluation of the Quality of Machine Learning Explanations?
Evaluation of Hate Speech Detection Using Large Language Models and Geographical Contextualization
Judge as A Judge: Improving the Evaluation of Retrieval-Augmented Generation through the Judge-Consistency of Large Language Models
Evaluation of Missing Data Imputation for Time Series Without Ground Truth