Dataset Watermarking
Dataset watermarking aims to protect the intellectual property of datasets used to train machine learning models by embedding imperceptible watermarks that allow for the detection of unauthorized usage. Current research focuses on developing robust watermarking techniques for various data types (images, tabular data, point clouds, text) using methods like clean-label backdoor watermarks, statistical hypothesis testing, and data perturbation, often within a black-box setting where only model outputs are accessible. This field is crucial for safeguarding valuable datasets, particularly in commercially sensitive areas like healthcare and generative AI, and ensuring fair attribution and preventing model theft.
Papers
September 27, 2024
August 10, 2024
June 21, 2024
May 22, 2024
March 26, 2024
February 16, 2024
October 9, 2023
June 22, 2023
March 20, 2023
September 27, 2022
August 4, 2022
February 25, 2022