Dialogue Safety
Dialogue safety research focuses on mitigating the generation of unsafe, toxic, or biased content by conversational AI systems. Current efforts concentrate on developing training methods, such as adversarial preference optimization and contrastive learning, that leverage both safe and unsafe data to improve model behavior, often incorporating commonsense social rules or guidelines for response generation. This field is crucial for ensuring responsible AI development, impacting the safety and ethical implications of deploying conversational agents in various applications, from mental health support to general-purpose chatbots.
Papers
October 18, 2024
May 21, 2024
February 14, 2024
February 1, 2024
October 30, 2023
July 31, 2023
May 25, 2023
April 11, 2023
February 2, 2023
December 20, 2022
December 4, 2022