Adversarial Removal
Adversarial removal focuses on mitigating the impact of malicious inputs designed to deceive machine learning models, a critical challenge across various applications. Current research emphasizes developing robust defenses against these attacks, exploring techniques like embedding-based adjustments for language models, counterfactual explanations for image forgery detection, and randomized smoothing for sequence classifiers. These efforts aim to improve the reliability and trustworthiness of AI systems by enhancing their resilience to adversarial manipulation, with implications for AI safety, security, and the broader deployment of machine learning technologies.
Papers
June 24, 2024
April 12, 2024
January 31, 2023
December 7, 2021