Interpretability Tool
Interpretability tools aim to make the inner workings of complex machine learning models, particularly deep neural networks and large language models, more transparent and understandable. Current research focuses on developing methods to explain model decisions for various architectures, including convolutional neural networks (CNNs) and transformers, often employing techniques like feature attribution and dialogue-based explanations. This work is crucial for building trust in AI systems, improving model debugging and design, and facilitating responsible deployment across diverse applications, from healthcare to finance. The ultimate goal is to move beyond simply identifying model outputs to understanding the reasoning behind them.
Papers
LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models
Gabriela Ben Melech Stan, Estelle Aflalo, Raanan Yehezkel Rohekar, Anahita Bhiwandiwalla, Shao-Yen Tseng, Matthew Lyle Olson, Yaniv Gurwicz, Chenfei Wu, Nan Duan, Vasudev Lal
The SaTML '24 CNN Interpretability Competition: New Innovations for Concept-Level Interpretability
Stephen Casper, Jieun Yun, Joonhyuk Baek, Yeseong Jung, Minhwan Kim, Kiwan Kwon, Saerom Park, Hayden Moore, David Shriver, Marissa Connor, Keltin Grimes, Angus Nicolson, Arush Tagade, Jessica Rumbelow, Hieu Minh Nguyen, Dylan Hadfield-Menell