New Attack
Research on attacks against large language models (LLMs) and related AI systems is rapidly expanding, focusing on vulnerabilities exploited to elicit harmful outputs or extract sensitive information. Current efforts concentrate on developing and evaluating various attack methods, including jailbreaking, data poisoning, prompt injection, and membership inference attacks, often targeting specific model architectures like transformer-based LLMs and diffusion models. This research is crucial for understanding and mitigating the risks associated with increasingly powerful AI systems, informing the development of more robust and trustworthy AI applications across diverse sectors.
128papers
Papers
February 27, 2025
LISArD: Learning Image Similarity to Defend Against Gray-box Adversarial Attacks
Joana C. Costa, Tiago Roxo, Hugo Proença, Pedro R. M. InácioInstituto de Telecomunicac ¸˜oes●Universidade da Beira InteriorAdaptive Attacks Break Defenses Against Indirect Prompt Injection Attacks on LLM Agents
Qiusi Zhan, Richard Fang, Henil Shalin Panchal, Daniel KangUniversity of Illinois Urbana-Champaign●Nirma University
February 25, 2025
Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks on LLMs via Removing Superfluous Constraints
Junxiao Yang, Zhexin Zhang, Shiyao Cui, Hongning Wang, Minlie HuangTsinghua UniversityMM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks
Hyeonjeong Ha, Qiusi Zhan, Jeonghwan Kim, Dimitrios Bralios, Saikrishna Sanniboina, Nanyun Peng, Kai-Wei Chang, Daniel Kang, Heng JiUniversity of Illinois Urbana-Champaign●University of California Los Angeles
February 18, 2025
UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models
Huawei Lin, Yingjie Lao, Tong Geng, Tan Yu, Weijie ZhaoRochester Institute of Technology●Tufts University●University of Rochester●NVIDIAIron Sharpens Iron: Defending Against Attacks in Machine-Generated Text Detection with Adversarial Training
Yuanfan Li, Zhaohan Zhang, Chengzhengxu Li, Chao Shen, Xiaoming LiuXi’an Jiaotong University●Queen Mary University of LondonTowards Robust and Secure Embodied AI: A Survey on Vulnerabilities and Attacks
Wenpeng Xing, Minghao Li, Mohan Li, Meng HanZhejiang University●Heilongjiang University●Guangzhou University
February 14, 2025
February 12, 2025
Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks
Ang Li, Yin Zhou, Vethavikashini Chithrra Raghuram, Tom Goldstein, Micah GoldblumCompromising Honesty and Harmlessness in Language Models via Deception Attacks
Laurène Vaugrante, Francesca Carlon, Maluna Menke, Thilo Hagendorff
February 10, 2025
February 9, 2025
February 8, 2025
February 7, 2025
Towards LLM Unlearning Resilient to Relearning Attacks: A Sharpness-Aware Minimization Perspective and Beyond
Chongyu Fan, Jinghan Jia, Yihua Zhang, Anil Ramakrishna, Mingyi Hong, Sijia LiuFrom Counterfactuals to Trees: Competitive Analysis of Model Extraction Attacks
Awa Khouna, Julien Ferry, Thibaut Vidal