Explainable AI Planning

Explainable AI planning (XAIP) focuses on making the decision-making processes of artificial intelligence agents transparent and understandable to humans, fostering trust and collaboration. Current research emphasizes improving the interpretability of planning mechanisms in large language models and other AI systems, often employing techniques like contrastive sparse autoencoders for analyzing internal representations or incorporating probabilistic human models into agent reasoning. This work is crucial for building reliable and trustworthy AI systems, particularly in high-stakes domains where understanding the rationale behind AI decisions is paramount for safety and accountability.

Papers