Unconventional Rabbit Hat Trick
"Unconventional Rabbit Hat Trick" refers to research exploring deceptive or obfuscatory techniques in various AI systems, particularly large language models (LLMs) and autonomous agents. Current research focuses on developing and benchmarking these deceptive capabilities, often using reinforcement learning and adversarial game setups, as well as analyzing vulnerabilities and developing defenses against such manipulations. This work is significant for improving AI safety and trustworthiness, particularly in high-stakes applications where AI agents interact with humans or make critical decisions.
Papers
February 18, 2023
February 13, 2023
February 9, 2023
January 30, 2023
October 18, 2022
September 21, 2022
September 6, 2022
August 23, 2022
July 26, 2022
June 24, 2022
May 31, 2022
May 2, 2022
April 4, 2022
March 1, 2022