Expression Segmentation

Referring expression segmentation (RES) aims to identify and segment an object within an image based on a natural language description, bridging the gap between computer vision and natural language processing. Recent research focuses on improving accuracy and generalization, particularly through the integration of large language models (LLMs) and innovative architectures like the Segment Anything Model (SAM), often employing semi-supervised or weakly-supervised learning techniques to reduce reliance on extensive labeled datasets. These advancements are significant for applications requiring precise object localization from natural language instructions, such as robotics, human-computer interaction, and image retrieval.

Papers