Mechanistic Understanding
Mechanistic understanding in machine learning aims to explain how models arrive at their predictions by analyzing their internal workings, moving beyond simply observing input-output relationships. Current research focuses on identifying and characterizing the causal mechanisms within various architectures, including transformers and physics-informed neural networks, often employing techniques from causal mediation analysis and probing internal representations to understand phenomena like hallucinations and chain-of-thought reasoning. This pursuit is crucial for improving model reliability, mitigating biases, and developing more robust and trustworthy AI systems across diverse applications, from healthcare to scientific discovery.
Papers
Physics-Informed Neural Networks for Dynamic Process Operations with Limited Physical Knowledge and Data
Mehmet Velioglu, Song Zhai, Sophia Rupprecht, Alexander Mitsos, Andreas Jupke, Manuel Dahmen
Position: An Inner Interpretability Framework for AI Inspired by Lessons from Cognitive Neuroscience
Martina G. Vilas, Federico Adolfi, David Poeppel, Gemma Roig