Token Attribution
Token attribution methods aim to understand which parts of an input (e.g., words in a sentence) most influence a model's prediction, enhancing model interpretability and reliability. Current research focuses on improving the accuracy and faithfulness of these attributions, exploring techniques that incorporate multiple model components (like entire transformer encoder layers) and comparing attributions across different languages and models. These advancements are crucial for building trust in AI systems, particularly in high-stakes applications where understanding model decision-making is paramount, as demonstrated by studies showing improved human performance in tasks involving model-assisted decision making.
Papers
February 6, 2023
May 6, 2022
May 3, 2022
December 23, 2021