Safety Evaluation
Safety evaluation of AI systems, particularly large language models (LLMs) and autonomous driving systems (ADS), focuses on developing robust methods to assess and mitigate potential risks. Current research emphasizes creating comprehensive benchmark datasets and toolkits, employing techniques like red-teaming exercises, scenario-based analysis, and uncertainty quantification to evaluate various safety dimensions (e.g., bias, toxicity, adversarial attacks). These advancements are crucial for building trust and ensuring the responsible deployment of AI technologies across diverse applications, impacting both the scientific understanding of AI safety and the development of safer, more reliable systems.
Papers
RAVE Checklist: Recommendations for Overcoming Challenges in Retrospective Safety Studies of Automated Driving Systems
John M. Scanlon, Eric R. Teoh, David G. Kidd, Kristofer D. Kusano, Jonas Bärgman, Geoffrey Chi-Johnston, Luigi Di Lillo, Francesca Favaro, Carol Flannagan, Henrik Liers, Bonnie Lin, Magdalena Lindman, Shane McLaughlin, Miguel Perez, Trent Victor
SAGE-RT: Synthetic Alignment data Generation for Safety Evaluation and Red Teaming
Anurakt Kumar, Divyanshu Kumar, Jatan Loya, Nitin Aravind Birur, Tanay Baswa, Sahil Agarwal, Prashanth Harshangi