Data Leakage

Data leakage in machine learning refers to the unintended exposure of sensitive information from training data through model outputs or intermediate computations. Current research focuses on detecting and mitigating leakage in various contexts, including federated learning, large language models, and recommendation systems, employing techniques like differential privacy and adversarial training to enhance privacy while maintaining model accuracy. Understanding and addressing data leakage is crucial for ensuring the responsible development and deployment of machine learning systems, particularly in sensitive domains like healthcare and finance, and for establishing reliable benchmarks for model evaluation.

Papers