Rater Disagreement

Rater disagreement, the inconsistency in judgments made by multiple human annotators, is a significant challenge across diverse fields, from AI safety and medical image analysis to scientific peer review. Current research focuses on understanding the root causes of this disagreement, developing taxonomies to categorize different types of discrepancies, and exploring methods to effectively incorporate this uncertainty into machine learning models. This includes investigating techniques like vicarious annotation and novel label fusion methods, such as those based on SoftSeg, to improve model calibration and accurately reflect inter-rater variability. Addressing rater disagreement is crucial for building more robust and reliable AI systems and enhancing the objectivity and fairness of human-driven processes.

Papers