White Box

"White-box" research in machine learning focuses on analyzing and manipulating models with complete access to their internal parameters and workings, primarily to assess vulnerabilities and improve security. Current research emphasizes adversarial attacks (e.g., poisoning training data, crafting adversarial examples) and defenses against these attacks, often targeting specific model architectures like transformers and graph neural networks, as well as exploring techniques like watermarking for intellectual property protection. This research is crucial for building more robust and trustworthy AI systems, impacting the security of various applications from autonomous vehicles to large language models and mitigating risks associated with data privacy and model integrity.

Papers

February 8, 2024

Investigating White-Box Attacks for On-Device Models
Mingyi Zhou, Xiang Gao, Jing Wu, Kui Liu, Hailong Sun, Li Li
White Box Device Model

December 21, 2023

Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models
Jingwei Yi, Yueqi Xie, Bin Zhu, Emre Kiciman, Guangzhong Sun, Xing Xie, Fangzhao Wu
Black Box Benchmark Platform White Box Prompt Injection Attack Black Box Defense

December 19, 2023

Find the Lady: Permutation and Re-Synchronization of Deep Neural Networks
Carl De Sousa Trias, Mihai Petru Mitrea, Attilio Fiandrotti, Marco Cagnazzo, Sumanta Chaudhuri, Enzo Tartaglione
Deep Neural Network White Box Machine Permutation Female Speaker

December 17, 2023

SAME: Sample Reconstruction against Model Extraction Attacks
Yi Xie, Jie Zhang, Shiqian Zhao, Tianwei Zhang, Xiaofeng Chen
Machine Learning Deep Learning Model White Box Model Extraction Attack Active Defense Sample Reconstruction

November 30, 2023

Improving the Robustness of Quantized Deep Neural Networks to White-Box Attacks using Stochastic Quantization and Information-Theoretic Ensemble Training
Saurabh Farkya, Aswin Raghavan, Avi Ziskind
Adversarial Attack Native Robustness Adversarial Training Ensemble Learning White Box Quantized Neural Network Different Quantization Stochastic Quantization

November 29, 2023

Quantum Neural Networks under Depolarization Noise: Exploring White-Box Attacks and Defenses
David Winderl, Nicola Franco, Jeanette Miriam Lorenz
Adversarial Attack Quantum Machine Learning Quantum Physic Quantum Neural Network White Box

October 30, 2023

Label-Only Model Inversion Attacks via Knowledge Transfer
Ngoc-Bao Nguyen, Keshigeyan Chandrasegaran, Milad Abdollahzadeh, Ngai-Man Cheung
Knowledge Transfer White Box Model Inversion Opaque Machine Learning Label Only Model Inversion Attack

October 26, 2023

PubDef: Defending Against Transfer Attacks From Public Models
Chawin Sitawarin, Jaewon Chang, David Huang, Wesson Altoyan, David Wagner
Adversarial Attack White Box Transfer Attack White Box Adversarial Attack Public Model

October 25, 2023

September 29, 2023

Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks
Vaidehi Patil, Peter Hase, Mohit Bansal
Language Model Medical LLM Sensitive Data White Box Prompt Attack Defense Method Extraction Attack

September 11, 2023

Optimization of Raman amplifiers: a comparison between black-, grey- and white-box modeling
Metodi P. Yankov, Mehran Soltani, Andrea Carena, Darko Zibar, Francesco Da Ros
Optimization Purpose Consistent Comparison White Box Grayscale Image Optical Network Flatness Aware Raman Amplifier

August 22, 2023

Adaptive White-Box Watermarking with Self-Mutual Check Parameters in Deep Neural Networks
Zhenzhe Gao, Zhaoxia Yin, Hongjian Zhan, Heng Yin, Yue Lu
Deep Neural Network Artificial Intelligence Model White Box Tampered Image Bit Allocation Self Checking Fragile Watermarking

August 18, 2023

On Gradient-like Explanation under a Black-box Setting: When Black-box Explanations Become as Good as White-box
Yi Cai, Gerhard Wunder
Black Box Gradient Based Feature Attribution Attribution Method White Box Gradient Based Explanation

August 17, 2023

A White-Box False Positive Adversarial Attack Method on Contrastive Loss Based Offline Handwritten Signature Verification Models
Zhongliang Guo, Weiye Li, Yifei Qian, Ognjen Arandjelović, Lei Fang
White Box White Box Adversarial Attack Offline Handwritten Signature Verification

August 15, 2023

A Review of Adversarial Attacks in Computer Vision
Yutong Zhang, Yao Li, Yin Li, Zhichang Guo
Adversarial Attack Computer Vision Adversarial Sample White Box Deep Neural Network Classifier

July 18, 2023

Saliency strikes back: How filtering out high frequencies improves white-box explanations
Sabine Muzellec, Thomas Fel, Victor Boutin, Léo andéol, Rufin VanRullen, Thomas Serre
High Explainability Explainability Method Attribution Method Human Saliency White Box Dominant Low Frequency

June 19, 2023

Eigenpatches -- Adversarial Patches from Principal Components
Jens Bayer, Stefan Becker, David Münch, Michael Arens
Adversarial Patch White Box Evasion Attack Principal Component Eigen Portfolio

June 8, 2023

Investigating the Effect of Misalignment on Membership Privacy in the White-box Setting
Ana-Maria Cretu, Daniel Jones, Yves-Alexandre de Montjoye, Shruti Tople
Mixed Effect White Box Membership Privacy Segment Misalignment Shadow Model Box Membership Inference Attack Local Ultimate Gradient Inspection Model Misalignment

May 20, 2023

Multi-Task Models Adversarial Attacks
Lijun Zhang, Xiao Liu, Kaleel Mahmood, Caiwen Ding, Hui Guan
Adversarial Attack Multi Task Learning White Box Multi Task Model

White Box

Papers

Investigating White-Box Attacks for On-Device Models

Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models

Find the Lady: Permutation and Re-Synchronization of Deep Neural Networks

SAME: Sample Reconstruction against Model Extraction Attacks

Improving the Robustness of Quantized Deep Neural Networks to White-Box Attacks using Stochastic Quantization and Information-Theoretic Ensemble Training

Quantum Neural Networks under Depolarization Noise: Exploring White-Box Attacks and Defenses

Label-Only Model Inversion Attacks via Knowledge Transfer

PubDef: Defending Against Transfer Attacks From Public Models

Defense Against Model Extraction Attacks on Recommender Systems

CycleAlign: Iterative Distillation from Black-box LLM to White-box Models for Better Human Alignment

Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks

Optimization of Raman amplifiers: a comparison between black-, grey- and white-box modeling

Adaptive White-Box Watermarking with Self-Mutual Check Parameters in Deep Neural Networks

On Gradient-like Explanation under a Black-box Setting: When Black-box Explanations Become as Good as White-box

A White-Box False Positive Adversarial Attack Method on Contrastive Loss Based Offline Handwritten Signature Verification Models

A Review of Adversarial Attacks in Computer Vision

Saliency strikes back: How filtering out high frequencies improves white-box explanations

Eigenpatches -- Adversarial Patches from Principal Components

Investigating the Effect of Misalignment on Membership Privacy in the White-box Setting

Multi-Task Models Adversarial Attacks