LLM a a Judge
Large language models (LLMs) are increasingly used as automated evaluators ("LLM-as-a-Judge") for various tasks, aiming to replace or supplement human judgment in assessing the quality of other LLMs' outputs. Current research focuses on improving the reliability and reducing biases in these LLM judges, often employing techniques like Minimum Bayes Risk decoding and response-adapted references to enhance accuracy and alignment with human preferences. This approach offers a cost-effective and scalable alternative to human evaluation, with significant implications for benchmarking, model training (e.g., reinforcement learning from human feedback), and the development of more aligned and robust AI systems.
24papers
Papers - Page 2
February 7, 2025
February 3, 2025
January 24, 2025
January 22, 2025
January 1, 2025
December 31, 2024
December 28, 2024
December 17, 2024
December 12, 2024
December 6, 2024
November 23, 2024
October 28, 2024
October 16, 2024
October 15, 2024
October 14, 2024
October 7, 2024