White Box

"White-box" research in machine learning focuses on analyzing and manipulating models with complete access to their internal parameters and workings, primarily to assess vulnerabilities and improve security. Current research emphasizes adversarial attacks (e.g., poisoning training data, crafting adversarial examples) and defenses against these attacks, often targeting specific model architectures like transformers and graph neural networks, as well as exploring techniques like watermarking for intellectual property protection. This research is crucial for building more robust and trustworthy AI systems, impacting the security of various applications from autonomous vehicles to large language models and mitigating risks associated with data privacy and model integrity.

Papers