Paper ID: 2409.15087

Towards Accountable AI-Assisted Eye Disease Diagnosis: Workflow Design, External Validation, and Continual Learning

Qingyu Chen, Tiarnan D L Keenan, Elvira Agron, Alexis Allot, Emily Guan, Bryant Duong, Amr Elsawy, Benjamin Hou, Cancan Xue, Sanjeeb Bhandari, Geoffrey Broadhead, Chantal Cousineau-Krieger, Ellen Davis, William G Gensheimer, David Grasic, Seema Gupta, Luis Haddock, Eleni Konstantinou, Tania Lamba, Michele Maiberger, Dimosthenis Mantopoulos, Mitul C Mehta, Ayman G Nahri, Mutaz AL-Nawaflh, Arnold Oshinsky, Brittany E Powell, Boonkit Purt, Soo Shin, Hillary Stiefel, Alisa T Thavikulwat, Keith James Wroblewski, Tham Yih Chung, Chui Ming Gemmy Cheung, Ching-Yu Cheng, Emily Y Chew, Michelle R. Hribar, Michael F. Chiang, Zhiyong Lu

Timely disease diagnosis is challenging due to increasing disease burdens and limited clinician availability. AI shows promise in diagnosis accuracy but faces real-world application issues due to insufficient validation in clinical workflows and diverse populations. This study addresses gaps in medical AI downstream accountability through a case study on age-related macular degeneration (AMD) diagnosis and severity classification. We designed and implemented an AI-assisted diagnostic workflow for AMD, comparing diagnostic performance with and without AI assistance among 24 clinicians from 12 institutions with real patient data sampled from the Age-Related Eye Disease Study (AREDS). Additionally, we demonstrated continual enhancement of an existing AI model by incorporating approximately 40,000 additional medical images (named AREDS2 dataset). The improved model was then systematically evaluated using both AREDS and AREDS2 test sets, as well as an external test set from Singapore. AI assistance markedly enhanced diagnostic accuracy and classification for 23 out of 24 clinicians, with the average F1-score increasing by 20% from 37.71 (Manual) to 45.52 (Manual + AI) (P-value < 0.0001), achieving an improvement of over 50% in some cases. In terms of efficiency, AI assistance reduced diagnostic times for 17 out of the 19 clinicians tracked, with time savings of up to 40%. Furthermore, a model equipped with continual learning showed robust performance across three independent datasets, recording a 29% increase in accuracy, and elevating the F1-score from 42 to 54 in the Singapore population.

Submitted: Sep 23, 2024