Visual Word
Visual words represent image features as discrete units, analogous to words in natural language, enabling efficient image analysis and bridging the semantic gap between low-level features and high-level understanding. Current research focuses on leveraging visual words within various architectures, including bag-of-visual-words (BoVW) models and auto-regressive approaches integrated with Large Language Models (LLMs), to improve tasks such as image annotation, retrieval, and weakly-supervised semantic segmentation. This work is significant for advancing computer vision by enabling more robust and interpretable image understanding, with applications ranging from medical image analysis to improved multimedia search and retrieval.
Papers
March 12, 2024
October 17, 2022
June 17, 2022
May 22, 2022
February 22, 2022
February 10, 2022