Watermark Removal Attack

Watermark removal attacks target methods designed to protect the intellectual property of machine learning models, particularly deep neural networks (DNNs) and large language models (LLMs). Current research focuses on developing more robust watermarking techniques, exploring various embedding strategies (e.g., backdooring, modifying model parameters, altering label distributions) and analyzing their vulnerabilities to attacks like fine-tuning, pruning, and model extraction. The ability to reliably embed and verify watermarks is crucial for protecting the significant investments in developing these models and ensuring fair use in commercial applications.

Papers