Paper ID: 2312.05900

Deep-Learning-Assisted Analysis of Cataract Surgery Videos

Negin Ghamsarian

Following the technological advancements in medicine, the operation rooms are evolving into intelligent environments. The context-aware systems (CAS) can comprehensively interpret the surgical state, enable real-time warning, and support decision-making, especially for novice surgeons. These systems can automatically analyze surgical videos and perform indexing, documentation, and post-operative report generation. The ever-increasing demand for such automatic systems has sparked machine-learning-based approaches for surgical video analysis. This thesis addresses the significant challenges in cataract surgery video analysis to pave the way for building efficient context-aware systems. The main contributions of this thesis are five folds: (1) This thesis demonstrates that spatio-temporal localization of the relevant content can considerably improve phase recognition accuracy. (2) This thesis proposes a novel deep-learning-based framework for relevance-based compression to enable real-time streaming and adaptive storage of cataract surgery videos. (3) Several convolutional modules are proposed to boost the networks' semantic interpretation performance in challenging conditions. These challenges include blur and reflection distortion, transparency, deformability, color and texture variation, blunt edges, and scale variation. (4) This thesis proposes and evaluates the first framework for automatic irregularity detection in cataract surgery videos. (5) To alleviate the requirement for manual pixel-based annotations, this thesis proposes novel strategies for self-supervised representation learning adapted to semantic segmentation.

Submitted: Dec 10, 2023