Paper ID: 2406.11880
Knowledge Return Oriented Prompting (KROP)
Jason Martin, Kenneth Yeung
Many Large Language Models (LLMs) and LLM-powered apps deployed today use some form of prompt filter or alignment to protect their integrity. However, these measures aren't foolproof. This paper introduces KROP, a prompt injection technique capable of obfuscating prompt injection attacks, rendering them virtually undetectable to most of these security measures.
Submitted: Jun 11, 2024