Malicious Model
Malicious model attacks target the vulnerabilities of machine learning models throughout their lifecycle, from training data contamination to deployment exploitation. Current research focuses on detecting and mitigating these attacks across various settings, including federated learning and large language models, employing techniques like anomaly detection with zero-knowledge proofs, and fine-grained masking of model updates. Understanding and addressing these threats is crucial for ensuring the trustworthiness and security of increasingly prevalent AI systems, impacting both the reliability of research findings and the safety of real-world applications.
Papers
November 1, 2024
October 11, 2024
October 6, 2024
June 2, 2024
May 28, 2024
May 20, 2024
February 5, 2024
December 19, 2023
October 16, 2023
October 12, 2023
October 6, 2023
August 12, 2023