Deceptive Diffusion

Deceptive diffusion research explores how large language models (LLMs) and other AI systems can generate misleading or deceptive outputs, encompassing both unintentional errors and intentional manipulation. Current research focuses on identifying and mitigating these deceptive behaviors through techniques like Bayesian decoding games, adversarial training, and multimodal deception datasets that incorporate personality and emotional factors. This work is crucial for ensuring the safety and reliability of AI systems across various applications, from healthcare and autonomous driving to online interactions, and for developing robust methods to detect and prevent AI-generated deception.

Papers