Hidden Backdoor

Hidden backdoors in machine learning models represent a significant security vulnerability where malicious actors embed triggers into models during training, causing unintended behavior upon activation. Current research focuses on detecting and mitigating these backdoors across various architectures, including large language models, neural radiance fields, and federated learning systems, with a particular emphasis on understanding how different training paradigms (e.g., contrastive learning, reinforcement learning) exacerbate the problem. The widespread adoption of AI in critical applications necessitates robust defenses against these attacks, driving ongoing efforts to develop more secure training methods and detection algorithms.

Papers