Unconventional Rabbit Hat Trick
"Unconventional Rabbit Hat Trick" refers to research exploring deceptive or obfuscatory techniques in various AI systems, particularly large language models (LLMs) and autonomous agents. Current research focuses on developing and benchmarking these deceptive capabilities, often using reinforcement learning and adversarial game setups, as well as analyzing vulnerabilities and developing defenses against such manipulations. This work is significant for improving AI safety and trustworthiness, particularly in high-stakes applications where AI agents interact with humans or make critical decisions.
Papers
October 24, 2024
October 10, 2024
August 13, 2024
July 14, 2024
June 13, 2024
May 7, 2024
April 19, 2024
April 4, 2024
March 21, 2024
February 20, 2024
January 16, 2024
October 17, 2023
October 3, 2023
August 17, 2023
July 31, 2023
June 27, 2023
March 16, 2023
March 15, 2023