Human Judgment
Human judgment, a cornerstone of cognitive science, is being rigorously investigated through its comparison with the outputs of increasingly sophisticated artificial intelligence models, particularly large language models (LLMs). Current research focuses on understanding and mitigating biases in human evaluations of AI-generated content, analyzing the alignment between human and AI judgments across diverse tasks (e.g., text generation, image captioning, question answering), and developing new metrics to better capture the nuances of human perception. These studies are crucial for improving the reliability and trustworthiness of AI systems and for fostering more effective human-AI collaboration in various fields.
Papers
Integrating Expert Judgment and Algorithmic Decision Making: An Indistinguishability Framework
Rohan Alur, Loren Laine, Darrick K. Li, Dennis Shung, Manish Raghavan, Devavrat Shah
SocialGaze: Improving the Integration of Human Social Norms in Large Language Models
Anvesh Rao Vijjini, Rakesh R. Menon, Jiayi Fu, Shashank Srivastava, Snigdha Chaturvedi
LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
Anna Bavaresco, Raffaella Bernardi, Leonardo Bertolazzi, Desmond Elliott, Raquel Fernández, Albert Gatt, Esam Ghaleb, Mario Giulianelli, Michael Hanna, Alexander Koller, André F. T. Martins, Philipp Mondorf, Vera Neplenbroek, Sandro Pezzelle, Barbara Plank, David Schlangen, Alessandro Suglia, Aditya K Surikuchi, Ece Takmaz, Alberto Testoni
Evaluating Quality of Answers for Retrieval-Augmented Generation: A Strong LLM Is All You Need
Yang Wang, Alberto Garcia Hernandez, Roman Kyslyi, Nicholas Kersting