Deceptive Diffusion
Deceptive diffusion research explores how large language models (LLMs) and other AI systems can generate misleading or deceptive outputs, encompassing both unintentional errors and intentional manipulation. Current research focuses on identifying and mitigating these deceptive behaviors through techniques like Bayesian decoding games, adversarial training, and multimodal deception datasets that incorporate personality and emotional factors. This work is crucial for ensuring the safety and reliability of AI systems across various applications, from healthcare and autonomous driving to online interactions, and for developing robust methods to detect and prevent AI-generated deception.
Papers
November 11, 2024
October 26, 2024
October 1, 2024
September 27, 2024
July 17, 2024
July 16, 2024
June 28, 2024
June 18, 2024
June 17, 2024
June 5, 2024
February 18, 2024
February 7, 2024
February 6, 2024
January 10, 2024
November 27, 2023
October 19, 2023
September 8, 2023
July 19, 2023