Visual Structure

Visual structure research focuses on understanding and representing the spatial relationships and organization within visual data, aiming to improve machine perception and understanding of images and videos. Current efforts concentrate on developing models that effectively capture these structures, leveraging techniques like contrastive learning with cluster masking, scene graph encoding, and hierarchical segmentation within vision-language models. These advancements are crucial for improving applications such as radiology report generation, image retargeting, and enhancing the performance of vision-language models in tasks requiring structural knowledge extraction, ultimately leading to more robust and insightful AI systems.

Papers