Visual Embeddings
Visual embeddings represent images and videos as numerical vectors, aiming to capture their semantic content for various downstream tasks like image classification, video retrieval, and question answering. Current research focuses on improving the quality and robustness of these embeddings, often leveraging large language models (LLMs) and techniques like prompt learning, contrastive learning, and multi-modal fusion to better align visual and textual information. This work is significant because effective visual embeddings are crucial for enabling advanced AI applications that require understanding and reasoning about visual data, impacting fields ranging from computer vision to natural language processing.
Papers
July 10, 2023
May 12, 2023
April 30, 2023
April 5, 2023
January 11, 2023
November 18, 2022
May 17, 2022
January 3, 2022
November 24, 2021