Inter Rater
Inter-rater reliability (IRR) focuses on quantifying the agreement between multiple observers or raters assessing the same data, a crucial aspect in many fields where subjective judgment is involved. Current research emphasizes improving IRR estimates, particularly in situations with limited data, using techniques like Bayesian neural networks and comparative judgment methods, and exploring the use of large language models (LLMs) as cost-effective alternatives to human raters. Addressing IRR is vital for ensuring the validity and reproducibility of research findings across diverse domains, from medical image analysis and educational assessment to software engineering and mental health research, ultimately leading to more reliable and trustworthy results.
Papers
Multi-Scored Sleep Databases: How to Exploit the Multiple-Labels in Automated Sleep Scoring
Luigi Fiorillo, Davide Pedroncelli, Valentina Agostini, Paolo Favaro, Francesca Dalia Faraci
Bayesian approaches for Quantifying Clinicians' Variability in Medical Image Quantification
Jaeik Jeon, Yeonggul Jang, Youngtaek Hong, Hackjoon Shim, Sekeun Kim