Evaluation Method

Evaluating the performance of increasingly complex AI models, particularly large language models (LLMs) and other generative AI systems, is a critical and evolving field of research. Current efforts focus on developing more robust and comprehensive evaluation methods that move beyond simple accuracy metrics, incorporating human judgment, system-centric and user-centric factors, and addressing biases and limitations in existing benchmarks. These improved evaluation techniques are essential for ensuring the reliability, fairness, and responsible deployment of AI systems across diverse applications, ultimately shaping the future of AI development and its societal impact.

Papers

March 23, 2022

Robust Text Line Detection in Historical Documents: Learning and Evaluation Methods
Mélodie Boillet, Christopher Kermorvant, Thierry Paquet
LeArning Abstract Text Recognition Evaluation Method Historical Document Text Line Line Segmentation

March 22, 2022

A Text-to-Speech Pipeline, Evaluation Methodology, and Initial Fine-Tuning Results for Child Speech Synthesis
Rishabh Jain, Mariam Yiwere, Dan Bigioi, Peter Corcoran, Horia Cucu
Speech Synthesis Evaluation Method Child Speech

March 11, 2022

Achieving Reliable Human Assessment of Open-Domain Dialogue Systems
Tianbo Ji, Yvette Graham, Gareth J. F. Jones, Chenyang Lyu, Qun Liu
Evaluation Method Open Domain Dialogue System Dialogue Evaluation

March 7, 2022

ILDAE: Instance-Level Difficulty Analysis of Evaluation Data
Neeraj Varshney, Swaroop Mishra, Chitta Baral
Evaluation Method Evaluation Datasets Instance Hardness Evaluation Data

February 14, 2022

Measuring "Why" in Recommender Systems: a Comprehensive Survey on the Evaluation of Explainable Recommendation
Xu Chen, Yongfeng Zhang, Ji-Rong Wen
Global Evaluation Recommender System Line by Line Explanation Comprehensive Survey Evaluation Method Explainable Recommendation Persuasive Argument Recommendation Explanation

February 7, 2022

Evaluation Methods and Measures for Causal Learning Algorithms
Lu Cheng, Ruocheng Guo, Raha Moraffah, Paras Sheth, K. Selcuk Candan, Huan Liu
Evaluation Method Causal Learning New Measure Causal Machine Learning Causality Based Learning Causal Learning Algorithm

January 19, 2022

Data-to-Value: An Evaluation-First Methodology for Natural Language Projects
Jochen L. Leidner
Data Science Data Mining Big Data Evaluation Method Text Analytics

November 30, 2021

Emotions as abstract evaluation criteria in biological and artificial intelligences
Claudius Gros
Artificial Intelligence Experienced Emotion Evaluation Method Emotional State Continuous Emotion