Pseudo Caption

Pseudo captioning leverages automatically generated image descriptions, or "pseudo-captions," to improve the performance of vision-language models, particularly in scenarios with limited labeled data. Current research focuses on using these pseudo-captions for self-supervised learning, semi-supervised learning with techniques like optimal transport, and open-vocabulary object detection, often employing large pre-trained models and adapting them to specialized domains. This approach enhances the ability of models to understand and classify images, leading to improved performance in tasks like few-shot learning and open-vocabulary object detection, ultimately advancing the field of computer vision and multimodal learning.

Papers