Visual Semantic Alignment
Visual semantic alignment focuses on bridging the gap between visual and textual information, aiming to improve the understanding and processing of multimodal data. Current research emphasizes developing models that effectively align visual features (e.g., from images or videos) with semantic representations (e.g., from text descriptions or labels), often employing transformer-based architectures and techniques like cross-modal attention and prototype learning. This research is significant for advancing zero-shot learning, improving the efficiency of various computer vision tasks (like image segmentation and object recognition), and enabling more robust and accurate multimodal applications.
Papers
November 18, 2024
October 15, 2024
June 19, 2024
April 15, 2024
March 20, 2024
February 21, 2024
February 3, 2024
January 31, 2024
December 7, 2023
August 9, 2023
March 27, 2023
March 23, 2023
December 4, 2022
July 29, 2022
March 17, 2022
January 29, 2022
December 24, 2021
December 22, 2021