Paper ID: 2412.05892

BAMBA: A Bimodal Adversarial Multi-Round Black-Box Jailbreak Attacker for LVLMs

Ruoxi Cheng, Yizhong Ding, Shuirong Cao, Shaowei Yuan, Zhiqiang Wang, Xiaojun Jia

LVLMs are widely used but vulnerable to illegal or unethical responses under jailbreak attacks. To ensure their responsible deployment in real-world applications, it is essential to understand their vulnerabilities. There are four main issues in current work: single-round attack limitation, insufficient dual-modal synergy, poor transferability to black-box models, and reliance on prompt engineering. To address these limitations, we propose BAMBA, a bimodal adversarial multi-round black-box jailbreak attacker for LVLMs. We first use an image optimizer to learn malicious features from a harmful corpus, then deepen these features through a bimodal optimizer through text-image interaction, generating adversarial text and image for jailbreak. Experiments on various LVLMs and datasets demonstrate that BAMBA outperforms other baselines.

Submitted: Dec 8, 2024