Surgical Video
Surgical video analysis leverages computer vision and machine learning to automate tasks like instrument tracking, scene segmentation, and phase recognition in surgical procedures. Current research heavily employs deep learning models, including transformers, diffusion models, and various attention mechanisms, often incorporating techniques like self-supervised learning and transfer learning to address data scarcity and improve generalization across different surgical procedures and centers. These advancements aim to improve surgical training, enhance intraoperative decision-making through real-time feedback and guidance (e.g., augmented reality overlays), and ultimately contribute to safer and more efficient surgeries. The development of large, multi-centric datasets is crucial for advancing the field and ensuring robust model performance in real-world clinical settings.
Papers
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation
Kun Yuan, Vinkle Srivastav, Nassir Navab, Nicolas Padoy
SurgPETL: Parameter-Efficient Image-to-Surgical-Video Transfer Learning for Surgical Phase Recognition
Shu Yang, Zhiyuan Cai, Luyang Luo, Ning Ma, Shuchang Xu, Hao Chen
LLaVA-Surg: Towards Multimodal Surgical Assistant via Structured Surgical Video Learning
Jiajie Li, Garrett Skinner, Gene Yang, Brian R Quaranto, Steven D Schwaitzberg, Peter C W Kim, Jinjun Xiong
Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning
Haofeng Liu, Erli Zhang, Junde Wu, Mingxuan Hong, Yueming Jin