Paper ID: 2403.19050

Detecting Generative Parroting through Overfitting Masked Autoencoders

Saeid Asgari Taghanaki, Joseph Lambourne

The advent of generative AI models has revolutionized digital content creation, yet it introduces challenges in maintaining copyright integrity due to generative parroting, where models mimic their training data too closely. Our research presents a novel approach to tackle this issue by employing an overfitted Masked Autoencoder (MAE) to detect such parroted samples effectively. We establish a detection threshold based on the mean loss across the training dataset, allowing for the precise identification of parroted content in modified datasets. Preliminary evaluations demonstrate promising results, suggesting our method's potential to ensure ethical use and enhance the legal compliance of generative models.

Submitted: Mar 27, 2024

Topics

Generative Model
Training Data
Masked Autoencoders
Generative AI Model
Masked AutoEncoder

Links

arXiv PDF