Dialogue Safety

Dialogue safety research focuses on mitigating the generation of unsafe, toxic, or biased content by conversational AI systems. Current efforts concentrate on developing training methods, such as adversarial preference optimization and contrastive learning, that leverage both safe and unsafe data to improve model behavior, often incorporating commonsense social rules or guidelines for response generation. This field is crucial for ensuring responsible AI development, impacting the safety and ethical implications of deploying conversational agents in various applications, from mental health support to general-purpose chatbots.

Papers