Step Attack

Step attacks, encompassing single-turn and multi-turn variations, aim to exploit vulnerabilities in machine learning models, particularly large language models (LLMs) and deep neural networks (DNNs), by crafting carefully designed inputs to elicit undesired outputs or behaviors. Current research focuses on developing more sophisticated attack strategies, such as crescendo attacks and context-based fusion attacks, alongside robust defenses, including adversarial training with adaptive step sizes and contrastive decoding methods. Understanding and mitigating the effectiveness of step attacks is crucial for ensuring the safety and reliability of AI systems across various applications, from conversational AI to medical image analysis and federated learning.

Papers