Dynamic Benchmark
Dynamic benchmarking addresses the limitations of static datasets in evaluating machine learning models, particularly large language models (LLMs), by creating continuously updated and evolving evaluation sets. Current research focuses on developing dynamic benchmarks for various tasks, including forecasting, safety assessment (e.g., jailbreaking), mathematical reasoning, and agent control in simulated environments, often employing techniques like bandit algorithms and automated data generation. This approach aims to improve model robustness, uncover hidden biases, and ultimately lead to more reliable and generalizable AI systems across diverse applications.
Papers
November 2, 2024
October 29, 2024
October 11, 2024
September 30, 2024
August 9, 2024
June 18, 2024
June 5, 2024
May 23, 2024
February 16, 2024
December 22, 2023
February 23, 2023
October 6, 2022