Reasoning Segmentation
Reasoning segmentation aims to segment images or videos based on complex, often implicit, textual instructions requiring world knowledge and reasoning capabilities, going beyond simple keyword-based segmentation. Current research heavily utilizes large language models (LLMs) coupled with segmentation models like Segment Anything Model (SAM), often employing techniques like chain-of-thought prompting and specialized tokens to bridge the gap between language understanding and visual segmentation. This field is significant for advancing multimodal AI, enabling more robust and flexible interaction with visual data in applications such as robotics, autonomous driving, and assistive technologies.
Papers
October 24, 2024
September 29, 2024
July 16, 2024
April 12, 2024
April 8, 2024
March 21, 2024
December 28, 2023
December 4, 2023
August 1, 2023