Rationale Extraction

Rationale extraction aims to identify the specific parts of an input text most influential in a model's prediction, enhancing model interpretability and trustworthiness. Current research focuses on developing more accurate and faithful rationale extraction methods, often employing attention mechanisms, multi-task learning, and adversarial training within various model architectures, including transformers and graph neural networks. This work is significant because it addresses the "black box" nature of many machine learning models, improving both our understanding of model behavior and the reliability of their predictions across diverse applications like legal judgment prediction and abusive language detection.

Papers