Paper ID: 2409.19417

Subject Data Auditing via Source Inference Attack in Cross-Silo Federated Learning

Jiaxin Li, Marco Arazzi, Antonino Nocera, Mauro Conti

Source Inference Attack (SIA) in Federated Learning (FL) aims to identify which client used a target data point for local model training. It allows the central server to audit clients' data usage. In cross-silo FL, a client (silo) collects data from multiple subjects (e.g., individuals, writers, or devices), posing a risk of subject information leakage. Subject Membership Inference Attack (SMIA) targets this scenario and attempts to infer whether any client utilizes data points from a target subject in cross-silo FL. However, existing results on SMIA are limited and based on strong assumptions on the attack scenario. Therefore, we propose a Subject-Level Source Inference Attack (SLSIA) by removing critical constraints that only one client can use a target data point in SIA and imprecise detection of clients utilizing target subject data in SMIA. The attacker, positioned on the server side, controls a target data source and aims to detect all clients using data points from the target subject. Our strategy leverages a binary attack classifier to predict whether the embeddings returned by a local model on test data from the target subject include unique patterns that indicate a client trains the model with data from that subject. To achieve this, the attacker locally pre-trains models using data derived from the target subject and then leverages them to build a training set for the binary attack classifier. Our SLSIA significantly outperforms previous methods on three datasets. Specifically, SLSIA achieves a maximum average accuracy of 0.88 over 50 target subjects. Analyzing embedding distribution and input feature distance shows that datasets with sparse subjects are more susceptible to our attack. Finally, we propose to defend our SLSIA using item-level and subject-level differential privacy mechanisms.

Submitted: Sep 28, 2024