Referring Image Segmentation

Referring image segmentation (RIS) aims to identify and segment objects within an image based on a natural language description, bridging the gap between computer vision and natural language processing. Current research heavily focuses on improving cross-modal alignment between visual and textual information, employing transformer-based architectures and exploring techniques like early fusion, multi-modal attention, and iterative refinement to enhance accuracy and efficiency. Advances in RIS have significant implications for various applications, including robotics, medical image analysis, and autonomous driving, by enabling more sophisticated interaction between machines and human instructions.

Papers