Cross Lingual Backdoor Attack
Cross-lingual backdoor attacks exploit vulnerabilities in multilingual large language models (LLMs), aiming to trigger malicious outputs in multiple languages by poisoning training data in one or a few languages. Current research focuses on understanding the transferability of these attacks across different languages, investigating the susceptibility of various LLMs (including mT5, GPT-4, Llama series) to such attacks, and developing defense mechanisms against them, often involving embedding manipulation or translation-based techniques. These findings highlight significant security risks in multilingual LLMs and underscore the need for robust security measures to ensure the safe deployment of these increasingly prevalent technologies.
Papers
June 17, 2024
April 30, 2024
January 22, 2024
December 7, 2023