Model Hijacking Attack
Model hijacking attacks exploit vulnerabilities in machine learning models to force them to perform unintended tasks, raising significant security and accountability concerns. Current research focuses on extending these attacks to various learning paradigms (e.g., federated learning) and modalities (e.g., image classification to NLP), often employing encoder-decoder architectures or distance measures in latent space to achieve stealthy manipulation. The ability to subtly redirect model functionality highlights critical weaknesses in existing ML systems and underscores the need for robust defenses to ensure the trustworthiness and ethical deployment of AI.
Papers
August 4, 2024
July 31, 2024
June 3, 2024
May 12, 2023
March 21, 2023