Adversarial Training

Adversarial training aims to enhance the robustness of machine learning models, particularly deep neural networks, against adversarial attacks—malicious inputs designed to cause misclassification. Current research focuses on improving the efficiency and effectiveness of adversarial training methods, exploring techniques like vector quantization for input transformation, null-space projection for gradient optimization, and module-wise adaptive training for end-to-end systems, as well as applying these techniques to various model architectures including LLMs and Vision Transformers. This field is crucial for ensuring the reliability and security of AI systems in real-world applications, particularly in safety-critical domains where model robustness is paramount.

Papers