Adversarial Fine Tuning

Adversarial fine-tuning enhances the robustness of pre-trained models, like CLIP and various LLMs, against adversarial attacks—malicious inputs designed to mislead the model. Current research focuses on developing techniques to improve model resilience across diverse downstream tasks (e.g., image classification, semantic segmentation, and natural language processing) while maintaining or even improving performance on clean data, often employing methods like Siamese networks, multi-agent systems, and prompt engineering. This work is crucial for ensuring the reliability and security of AI systems in various applications, from medical diagnosis to autonomous vehicles, where vulnerability to adversarial attacks could have significant consequences.

Papers