Data Copying
Data copying, encompassing both intentional replication (e.g., for efficient data streaming or model deployment) and unintentional memorization (e.g., in generative models), is a significant area of research across diverse fields. Current efforts focus on developing methods to detect and mitigate unintended data copying in machine learning models, particularly in large language models and generative models, often employing techniques like attention matrix sharing or novel loss functions. Understanding and controlling data copying is crucial for ensuring data privacy, model security, and the ethical development and deployment of AI systems, with implications ranging from copyright protection to the responsible use of synthetic data.
Papers
October 22, 2024
October 9, 2024
September 26, 2024
September 22, 2024
July 9, 2024
June 18, 2024
November 14, 2023
October 21, 2023
September 13, 2023
July 13, 2023
July 1, 2023
May 26, 2023
May 22, 2023
February 25, 2023
February 6, 2023