Model Protection

Model protection research focuses on safeguarding the intellectual property of trained machine learning models, particularly deep neural networks and large language models, from theft or unauthorized use. Current approaches explore techniques like watermarking (embedding unique identifiers into model outputs or parameters), model locking (degrading model performance without the correct key), and architectural defenses (modifying model structure or training process to hinder extraction). These methods aim to balance strong protection against unauthorized access with minimal impact on model performance and usability, impacting the security and commercial viability of AI technologies.

Papers