Model Extraction Attack

Model extraction attacks aim to steal the functionality of machine learning models by querying their predictions, effectively replicating the model without access to its training data or internal parameters. Current research focuses on developing more efficient attack methods, particularly for large language models and object detectors, often employing techniques like knowledge distillation, active learning, and exploiting counterfactual explanations. This area is crucial for securing machine learning as a service platforms and protecting intellectual property, driving ongoing efforts to develop robust defenses such as watermarking and query unlearning.

Papers