Benign Input
"Benign input" research explores how seemingly harmless inputs interact with and reveal properties of machine learning models, encompassing both their vulnerabilities and capabilities. Current research focuses on understanding model behavior under benign inputs, including analyzing overfitting phenomena in various architectures like transformers and CNNs, evaluating circuit accuracy, and developing robust fingerprinting techniques. This work is crucial for improving model safety and reliability in applications ranging from medical image analysis to content moderation, ultimately contributing to more trustworthy and secure AI systems.
Papers
September 26, 2024
July 27, 2024
July 21, 2024
June 28, 2024
May 24, 2024
April 22, 2024
October 26, 2023
May 24, 2023
March 25, 2023
October 21, 2022
October 18, 2022
August 5, 2022
July 14, 2022