Safety Benchmark
AI safety benchmarks are tools designed to evaluate the safety of artificial intelligence systems, particularly large language models (LLMs), by systematically testing their responses to various prompts and scenarios. Current research focuses on developing more comprehensive benchmarks that cover a wider range of safety hazards, including those defined by regulations and policies, across multiple languages and model architectures, and that better distinguish genuine safety improvements from mere increases in general capabilities. These efforts are crucial for fostering the responsible development and deployment of AI, mitigating potential risks, and establishing a more rigorous scientific foundation for evaluating AI safety progress.
Papers
October 24, 2024
October 22, 2024
October 18, 2024
September 30, 2024
August 7, 2024
July 31, 2024
July 11, 2024
June 14, 2024
April 18, 2024
March 20, 2024
February 7, 2024
October 2, 2023
September 20, 2023
April 20, 2023