Efficient Adversarial Training

Efficient adversarial training aims to enhance the robustness of machine learning models, particularly deep neural networks and large language models (LLMs), against adversarial attacks—maliciously perturbed inputs designed to cause misclassification—without significantly sacrificing accuracy on clean data. Current research focuses on developing faster training algorithms, including those leveraging continuous attacks in embedding spaces, subnetwork sampling, gradient approximation, and data pruning techniques, to reduce computational costs associated with generating adversarial examples. These advancements are crucial for deploying robust models in resource-constrained environments and for improving the reliability and security of AI systems across various applications, from wireless communication to natural language processing.

Papers