Expression Segmentation
Referring expression segmentation (RES) aims to identify and segment an object within an image based on a natural language description, bridging the gap between computer vision and natural language processing. Recent research focuses on improving accuracy and generalization, particularly through the integration of large language models (LLMs) and innovative architectures like the Segment Anything Model (SAM), often employing semi-supervised or weakly-supervised learning techniques to reduce reliance on extensive labeled datasets. These advancements are significant for applications requiring precise object localization from natural language instructions, such as robotics, human-computer interaction, and image retrieval.
Papers
CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation
Zhuoyan Luo, Yinghao Wu, Tianheng Cheng, Yong Liu, Yicheng Xiao, Hongfa Wang, Xiao-Ping Zhang, Yujiu Yang
Bring Adaptive Binding Prototypes to Generalized Referring Expression Segmentation
Weize Li, Zhicheng Zhao, Haochen Bai, Fei Su