Human Annotated Rationale

Human-annotated rationales are text snippets identifying the input features supporting a model's prediction, aiming to improve model explainability and trustworthiness. Current research focuses on developing methods for automatically extracting these rationales, often using attention mechanisms, prompting techniques, or post-hoc explanation methods within various model architectures like transformers, and evaluating their alignment with human-provided rationales. This work is crucial for building more reliable and interpretable AI systems, addressing concerns about bias and improving model performance, particularly in out-of-domain settings and complex reasoning tasks. The ultimate goal is to create models that are not only accurate but also explain their decisions in a way that is understandable and trustworthy to humans.

Papers