Attack Method
Attack methods in machine learning and artificial intelligence focus on exploiting vulnerabilities in models to cause misbehavior or information leakage. Current research emphasizes attacks against large language models (LLMs), including jailbreaking (inducing harmful outputs), data extraction, and manipulating model behavior through prompt engineering or adversarial examples. These studies utilize various techniques, such as reinforcement learning for generating sophisticated attacks, graph neural networks for analyzing attack paths, and adversarial training for improving model robustness. Understanding and mitigating these attacks is crucial for ensuring the safety and reliability of AI systems across diverse applications.
Papers
January 13, 2024
January 11, 2024
December 30, 2023
December 26, 2023
December 20, 2023
September 29, 2023
September 20, 2023
July 21, 2023
May 31, 2023
April 25, 2023
November 17, 2022
October 17, 2022
August 31, 2022
August 25, 2022
August 17, 2022
June 22, 2022
May 27, 2022
March 9, 2022