Scalable Interpretability
Scalable interpretability aims to make the decision-making processes of complex machine learning models, such as deep neural networks, understandable and transparent, even as model size and data volume increase. Current research focuses on developing novel architectures and algorithms, including sparse feature circuits, in-database interpretability frameworks, and scalable polynomial additive models, that balance high predictive performance with readily accessible explanations. These advancements are crucial for building trust in AI systems across diverse applications, from medical image analysis to database querying, and for facilitating responsible AI development.
Papers
November 8, 2024
March 28, 2024
February 23, 2023
December 21, 2022
May 27, 2022
February 24, 2022