Disagreement Problem

The "disagreement problem" in AI refers to the inconsistencies observed across different methods used to explain or interpret machine learning model predictions, particularly in complex models like neural networks. Current research focuses on understanding the sources of this disagreement, including issues with data quality (e.g., subjective annotations), model architecture limitations, and the inherent limitations of explanation methods themselves. Addressing this problem is crucial for building trust in AI systems, particularly in high-stakes applications, and for developing more reliable and robust explanation techniques. This requires developing better evaluation metrics and potentially modifying model training to encourage more consistent explanations.

Papers