Black Box Defense

Black-box defense aims to protect machine learning models, particularly deep neural networks and large language models, from adversarial attacks without requiring access to the model's internal parameters. Current research focuses on developing robust defenses using techniques like randomized smoothing, Bayesian methods, and prompt learning, often employing autoencoders or diffusion models to enhance robustness. These methods are crucial for securing applications in sensitive areas like autonomous driving and healthcare, where model vulnerability poses significant risks, and are driving advancements in both theoretical understanding and practical implementation of secure AI systems.

Papers