AI Deception

AI deception, encompassing the intentional creation of false beliefs by artificial intelligence systems, is a burgeoning research area focusing on understanding its mechanisms, impact, and mitigation. Current research investigates deceptive capabilities in large language models (LLMs) and other AI architectures, analyzing techniques like strategic deception and misinformation propagation, and exploring detection methods such as analyzing response patterns and developing specialized classifiers. This field is crucial due to the significant risks posed by AI deception across various domains, including elections, healthcare, and cybersecurity, necessitating the development of robust detection and mitigation strategies to ensure responsible AI development and deployment.

Papers

December 23, 2024

Observation Interference in Partially Observable Assistance Games
Scott Emmons, Caspar Oesterheld, Vincent Conitzer, Stuart Russell
Single Agent Observable Stochastic Game AI Deception Action Pair

December 20, 2024

Towards Safe and Honest AI Agents with Neural Self-Other Overlap
Marc Carauleanu, Michael Vaiana, Judd Rosenblatt, Cameron Berg, Diogo Schwerz de Lucena
Trustworthy Artificial Intelligence AI Safety AI Deception

October 29, 2024

Is Our Chatbot Telling Lies? Assessing Correctness of an LLM-based Dutch Support Chatbot
Herman Lassche (1 and 2), Michiel Overeem (1), Ayushi Rastogi (2) ((1) AFAS Software, (2) University Groningen)
Language Model Chatbot Response Customer Service AI Deception

October 17, 2024

A Simulation System Towards Solving Societal-Scale Manipulation
Maximilian Puelma Touzel, Sneheel Sarangi, Austin Welch, Gayatri Krishnakumar, Dan Zhao, Zachary Yang, Hao Yu, Ethan Kosak-Hine, Tom Gibbs, Andreea Musulan, Camille Thibault, Busra Tugce Gurbuz, Reihaneh Rabbany, Jean-François Godbout, Kellin Pelrine
Simulation Study Human Trust AI Deception Solving Non Rectangular Simulation Based Technology Tool

August 8, 2024

Deceptive uses of Artificial Intelligence in elections strengthen support for AI ban
Andreas Jungherr, Adrian Rauchfleisch, Alexander Wuttke
Artificial Intelligence Customer Service Election Result AI Deception AI Harm Election Manipulation

July 31, 2024

Deceptive AI systems that give explanations are more convincing than honest AI systems and can amplify belief in misinformation
Valdemar Danry, Pat Pataranutaporn, Matthew Groh, Ziv Epstein, Pattie Maes
Artificial Intelligence Line by Line Explanation Trustworthy Artificial Intelligence Misinformation Claim Belief State AI Deception

June 9, 2024

Deception Analysis with Artificial Intelligence: An Interdisciplinary Perspective
Stefan Sarkadi
Artificial Intelligence Multi Agent System AI Agent Interdisciplinary Perspective AI Deception Deception Cue

February 7, 2024

Unmasking the Shadows of AI: Investigating Deceptive Capabilities in Large Language Models
Linge Guo
Artificial Intelligence Quantum Shadow Deceptive Diffusion Strategic Deception AI Deception Deceptive Power

January 14, 2024

Killer Apps: Low-Speed, Large-Scale AI Weapons
Philip Feldman, Aaron Dant, James R. Foulds
Artificial Intelligence Military Application AI Deception

December 3, 2023

Honesty Is the Best Policy: Defining and Mitigating AI Deception
Francis Rhys Ward, Francesco Belardinelli, Francesca Toni, Tom Everitt
Game Theory Truthful Space Symbolic AI AI Deception High Performing Policy Deceptive Power Structural Causal Game

November 27, 2023

Student Mastery or AI Deception? Analyzing ChatGPT's Assessment Proficiency and Evaluating Detection Strategies
Kevin Wang, Seth Akins, Abdallah Mohammed, Ramon Lawrence
Generative AI ChatGPT Generated Conversation Detection Method Concept Learning Generative AI System Artificial Intelligence Solution AI Deception AI Detection AI Generated Code Proficiency Vector

September 26, 2023

How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions
Lorenzo Pacchiardi, Alex J. Chan, Sören Mindermann, Ilan Moscovitz, Alexa Y. Pan, Yarin Gal, Owain Evans, Jan Brauner
Black Box Large Language Model Deception Detection AI Deception Lie Detection Relevant Question

September 14, 2023

M3Dsynth: A dataset of medical 3D images with AI-generated local manipulations
Giada Zingarini, Davide Cozzolino, Riccardo Corvi, Giovanni Poggi, Luisa Verdoliva
Generative Adversarial Network Data Set Medical Image 3D Medical Image AI Deception Synthetic Explanation

August 30, 2023

Strengthening the EU AI Act: Defining Key Terms on AI Manipulation
Matija Franklin, Philip Moreira Tomei, Rebecca Gorman
Artificial Intelligence AI Act Artificial Intelligence Act AI Harm Contentious Term AI Deception

August 28, 2023

AI Deception: A Survey of Examples, Risks, and Potential Solutions
Peter S. Park, Simon Goldstein, Aidan O'Gara, Michael Chen, Dan Hendrycks
Timely Survey AI System Risk Sensitive Fewer Example General Purpose Artificial Intelligence System AI Deception Feasible Solution

July 24, 2023

Regulating AI: Applying insights from behavioural economics and psychology to the application of article 5 of the EU AI Act
Huixin Zhong, Eamonn O'Neill, Janina A. Hoffmann
Artificial Intelligence DCU Insight AQ AI Act Artificial Intelligence Act Psychological Phenomenon Manipulation Strategy AI Deception Behavioral Economics

June 26, 2023

Experiments with Detecting and Mitigating AI Deception
Ismail Sahbane, Francis Rhys Ward, C Henrik Åslund
Data Detection Optical Experiment Trustworthy Artificial Intelligence Strategic Deception AI Deception

June 18, 2023

Deceptive AI Ecosystems: The Case of ChatGPT
Xiao Zhan, Yifan Xu, Stefan Sarkadi
ChatGPT Generated Conversation Case Relevance Online Conversation Chatbot Technology AI Deception

December 9, 2022

The Turing Deception
David Noever, Matt Ciolino
Text Generation Human Understanding Chatbot Response Machine Generated Turing Machine AI Deception

November 21, 2021

Modelling Direct Messaging Networks with Multiple Recipients for Cyber Deception
Kristen Moore, Cody J. Christopher, David Liebowitz, Surya Nepal, Renee Selvey
Multi Party Network Model AI Deception Cyber Deception