Model Stealing

Model stealing involves adversaries illicitly replicating machine learning models by exploiting query access or leaked information, thereby undermining intellectual property and potentially compromising sensitive data. Current research focuses on developing robust defenses against these attacks, particularly for large language models and self-supervised learning models, employing techniques like hardware-based restrictions, obfuscation of model architectures, and watermarking. This active area of research is crucial for securing the deployment of machine learning models as a service and protecting the valuable intellectual property embedded within them.

Papers