Malicious Model

Malicious model attacks target the vulnerabilities of machine learning models throughout their lifecycle, from training data contamination to deployment exploitation. Current research focuses on detecting and mitigating these attacks across various settings, including federated learning and large language models, employing techniques like anomaly detection with zero-knowledge proofs, and fine-grained masking of model updates. Understanding and addressing these threats is crucial for ensuring the trustworthiness and security of increasingly prevalent AI systems, impacting both the reliability of research findings and the safety of real-world applications.

Papers