Attack Method
Attack methods in machine learning and artificial intelligence focus on exploiting vulnerabilities in models to cause misbehavior or information leakage. Current research emphasizes attacks against large language models (LLMs), including jailbreaking (inducing harmful outputs), data extraction, and manipulating model behavior through prompt engineering or adversarial examples. These studies utilize various techniques, such as reinforcement learning for generating sophisticated attacks, graph neural networks for analyzing attack paths, and adversarial training for improving model robustness. Understanding and mitigating these attacks is crucial for ensuring the safety and reliability of AI systems across diverse applications.
Papers
November 17, 2024
October 16, 2024
October 14, 2024
October 9, 2024
September 5, 2024
August 20, 2024
August 15, 2024
August 9, 2024
July 25, 2024
June 25, 2024
June 6, 2024
May 25, 2024
April 24, 2024
April 18, 2024
March 12, 2024
March 11, 2024
March 9, 2024
February 21, 2024
February 14, 2024