Malicious Query

Malicious queries are adversarial inputs designed to exploit vulnerabilities in large language models (LLMs) and retrieval-augmented generation (RAG) systems, causing them to generate harmful or unintended outputs. Current research focuses on developing robust detection methods, including those leveraging smaller, more efficient models for identifying harmful queries and mitigating their effects, as well as defenses against attacks that obfuscate malicious intent through techniques like embedding harmful requests within benign narratives or decomposing them into innocuous sub-questions. Understanding and mitigating these vulnerabilities is crucial for ensuring the safe and responsible deployment of LLMs in various applications, impacting both the security of AI systems and the trustworthiness of their outputs.

Papers