Model Hijacking Attack

Model hijacking attacks exploit vulnerabilities in machine learning models to force them to perform unintended tasks, raising significant security and accountability concerns. Current research focuses on extending these attacks to various learning paradigms (e.g., federated learning) and modalities (e.g., image classification to NLP), often employing encoder-decoder architectures or distance measures in latent space to achieve stealthy manipulation. The ability to subtly redirect model functionality highlights critical weaknesses in existing ML systems and underscores the need for robust defenses to ensure the trustworthiness and ethical deployment of AI.

Papers