Visual Structure
Visual structure research focuses on understanding and representing the spatial relationships and organization within visual data, aiming to improve machine perception and understanding of images and videos. Current efforts concentrate on developing models that effectively capture these structures, leveraging techniques like contrastive learning with cluster masking, scene graph encoding, and hierarchical segmentation within vision-language models. These advancements are crucial for improving applications such as radiology report generation, image retargeting, and enhancing the performance of vision-language models in tasks requiring structural knowledge extraction, ultimately leading to more robust and insightful AI systems.
Papers
May 14, 2024
March 8, 2024
November 22, 2023
November 4, 2023
May 22, 2023
April 5, 2023
October 1, 2022
September 11, 2022
August 3, 2022