Referring Image Segmentation
Referring image segmentation (RIS) aims to identify and segment objects within an image based on a natural language description, bridging the gap between computer vision and natural language processing. Current research heavily focuses on improving cross-modal alignment between visual and textual information, employing transformer-based architectures and exploring techniques like early fusion, multi-modal attention, and iterative refinement to enhance accuracy and efficiency. Advances in RIS have significant implications for various applications, including robotics, medical image analysis, and autonomous driving, by enabling more sophisticated interaction between machines and human instructions.
Papers
Vision-Aware Text Features in Referring Image Segmentation: From Object Understanding to Context Understanding
Hai Nguyen-Truong, E-Ro Nguyen, Tuan-Anh Vu, Minh-Triet Tran, Binh-Son Hua, Sai-Kit Yeung
Calibration & Reconstruction: Deep Integrated Language for Referring Image Segmentation
Yichen Yan, Xingjian He, Sihan Chen, Jing Liu