Evaluation Practice
Evaluation practices in various fields, from natural language processing to medical imaging, are undergoing significant scrutiny due to limitations in existing metrics and biases in established protocols. Current research focuses on developing more robust and informative evaluation methods, often tailored to specific tasks and addressing issues like dataset bias, the propagation of errors in modular systems, and the limitations of relying solely on automatic metrics. Improved evaluation practices are crucial for advancing the reliability and trustworthiness of AI systems across diverse applications, ensuring that model performance accurately reflects real-world capabilities and avoids unintended consequences.
Papers
September 4, 2024
June 20, 2024
April 14, 2024
April 7, 2024
October 26, 2023
September 15, 2023
July 10, 2023
April 7, 2023
November 28, 2022
August 28, 2022
May 26, 2022
May 24, 2022
May 13, 2022
April 7, 2022