Malicious Instruction
Malicious instruction research focuses on how adversaries can manipulate large language models (LLMs) and code generation models to produce harmful outputs or inject malicious code, often through subtle alterations to prompts or training data. Current research investigates various attack vectors, including adversarial prompt engineering, backdoor attacks leveraging user behavior, and indirect manipulation via retrieval-augmented generation (RAG) systems, employing techniques like embedding similarity attacks and graph-based analysis of code structures. This work is crucial for securing LLMs and AI-powered applications, as vulnerabilities to malicious instructions pose significant risks to data integrity, user safety, and the trustworthiness of AI systems in general.