Dataset Watermarking
Dataset watermarking aims to protect the intellectual property of datasets used to train machine learning models by embedding imperceptible watermarks that allow for the detection of unauthorized usage. Current research focuses on developing robust watermarking techniques for various data types (images, tabular data, point clouds, text) using methods like clean-label backdoor watermarks, statistical hypothesis testing, and data perturbation, often within a black-box setting where only model outputs are accessible. This field is crucial for safeguarding valuable datasets, particularly in commercially sensitive areas like healthcare and generative AI, and ensuring fair attribution and preventing model theft.
13papers
Papers
February 15, 2025
November 19, 2024
September 27, 2024
May 22, 2024
February 16, 2024
September 27, 2022
February 25, 2022