Red Teaming
Red teaming, in the context of artificial intelligence, involves adversarial testing of AI models, particularly large language models (LLMs) and increasingly multimodal models, to identify vulnerabilities and biases. Current research focuses on automating this process using techniques like reinforcement learning, generative adversarial networks, and novel scoring functions to create diverse and effective adversarial prompts or inputs that expose model weaknesses. This rigorous evaluation is crucial for improving the safety, robustness, and ethical implications of AI systems, informing both model development and deployment strategies across various applications.
Papers
October 22, 2023
October 19, 2023
October 17, 2023
October 14, 2023
October 2, 2023
September 30, 2023
September 17, 2023
August 18, 2023
August 9, 2023
August 8, 2023
June 15, 2023
May 31, 2023
May 27, 2023
February 8, 2023
September 5, 2022
August 23, 2022
August 16, 2022