Malicious Prompt

Malicious prompts exploit vulnerabilities in large language models (LLMs) and vision-language models (VLMs) by manipulating input text or images to elicit undesired or harmful outputs, such as generating phishing emails or revealing sensitive information. Current research focuses on developing robust detection methods, often employing techniques like fuzzing, embedding generation using models such as BERT, and analyzing early model outputs to identify malicious inputs before harmful generation occurs. This research is crucial for ensuring the safe and reliable deployment of LLMs and VLMs in various applications, mitigating risks associated with adversarial attacks and improving overall system security.

Papers