Malicious Prompt
Malicious prompts exploit vulnerabilities in large language models (LLMs) and vision-language models (VLMs) by manipulating input text or images to elicit undesired or harmful outputs, such as generating phishing emails or revealing sensitive information. Current research focuses on developing robust detection methods, often employing techniques like fuzzing, embedding generation using models such as BERT, and analyzing early model outputs to identify malicious inputs before harmful generation occurs. This research is crucial for ensuring the safe and reliable deployment of LLMs and VLMs in various applications, mitigating risks associated with adversarial attacks and improving overall system security.
Papers
November 10, 2024
October 29, 2024
October 9, 2024
October 1, 2024
September 23, 2024
September 20, 2024
September 1, 2024
August 21, 2024
August 9, 2024
July 12, 2024
June 20, 2024
April 19, 2024
February 25, 2024
January 19, 2024
October 29, 2023
October 11, 2023