Text Privatization
Text privatization aims to protect sensitive information within text data while preserving its utility for analysis or downstream tasks. Current research focuses on methods using differential privacy, often implemented through text rewriting with large language models (LLMs) like BERT and BART, or by perturbing word embeddings. These techniques, while showing promise, face challenges in balancing privacy guarantees with the preservation of semantic meaning and linguistic coherence, leading to ongoing exploration of improved algorithms and model architectures. The field's impact lies in enabling responsible data sharing and analysis in various applications, from social science research to personalized healthcare, where privacy is paramount.
Papers
Characterizing Stereotypical Bias from Privacy-preserving Pre-Training
Stefan Arnold, Rene Gröbner, Annika Schreiner
A Collocation-based Method for Addressing Challenges in Word-level Metric Differential Privacy
Stephen Meisenbacher, Maulik Chevli, Florian Matthes
DP-MLM: Differentially Private Text Rewriting Using Masked Language Models
Stephen Meisenbacher, Maulik Chevli, Juraj Vladika, Florian Matthes