Circuit Discovery
Circuit discovery in neural networks aims to identify the minimal subnetworks ("circuits") responsible for specific model behaviors, enhancing our understanding of complex models like transformers and recurrent architectures. Current research focuses on developing more robust and efficient algorithms for circuit discovery, including methods based on edge pruning, differentiable graph pruning, and sparse dictionary learning, often applied to models like GPT-2 and its variants. Improved circuit discovery techniques are crucial for advancing mechanistic interpretability, ultimately leading to more trustworthy and explainable AI systems and potentially informing the design of more efficient and interpretable models.
Papers
October 16, 2024
October 10, 2024
July 19, 2024
July 11, 2024
July 4, 2024
June 25, 2024
June 24, 2024
May 22, 2024
February 19, 2024
January 8, 2024
October 16, 2023
April 28, 2023