Long Caption
Research on long captions in language-image pre-training aims to improve the ability of models to understand and generate detailed descriptions of images, overcoming limitations of existing datasets primarily containing short captions. Current efforts focus on developing new model architectures and training strategies, such as contrastive learning and adaptive token length assignment for vision transformers, to effectively utilize longer, more descriptive captions. This work is significant because it enhances the richness of image-text representations, leading to improved performance in various downstream tasks like image retrieval and semantic segmentation, and potentially impacting applications requiring detailed visual understanding.
Papers
October 13, 2024
October 7, 2024
March 25, 2024
January 15, 2024
September 15, 2023
July 5, 2023
November 27, 2022