Adversarial Context
Adversarial context research explores how seemingly innocuous contextual information can be manipulated to negatively impact the performance or security of machine learning models, particularly large language models (LLMs). Current research focuses on developing both attack methods, such as crafting adversarial prompts or examples to induce undesired behavior, and defense mechanisms, including techniques like adversarial training and code-style instruction prompting. This field is crucial for ensuring the robustness and trustworthiness of AI systems, with implications for various applications ranging from autonomous driving to question-answering systems and impacting the broader discussion around AI safety and regulation.
Papers
August 6, 2024
June 24, 2024
March 18, 2024
February 26, 2024
February 20, 2024
December 27, 2023
December 4, 2022
October 27, 2022