Paper ID: 2405.07145

Stable Signature is Unstable: Removing Image Watermark from Diffusion Models

Yuepeng Hu, Zhengyuan Jiang, Moyang Guo, Neil Gong

Watermark has been widely deployed by industry to detect AI-generated images. A recent watermarking framework called \emph{Stable Signature} (proposed by Meta) roots watermark into the parameters of a diffusion model's decoder such that its generated images are inherently watermarked. Stable Signature makes it possible to watermark images generated by \emph{open-source} diffusion models and was claimed to be robust against removal attacks. In this work, we propose a new attack to remove the watermark from a diffusion model by fine-tuning it. Our results show that our attack can effectively remove the watermark from a diffusion model such that its generated images are non-watermarked, while maintaining the visual quality of the generated images. Our results highlight that Stable Signature is not as stable as previously thought.

Submitted: May 12, 2024