Frozen Convolutional CLIP
Frozen Convolutional CLIP leverages pre-trained vision-language models, particularly CLIP's convolutional backbone, to perform various computer vision tasks without further training the visual encoder. Current research focuses on improving cross-modal feature interaction within these frozen models, often incorporating techniques like prompt learning and knowledge distillation to enhance performance on tasks such as video segmentation, anomaly detection, and open-vocabulary segmentation. This approach offers a significant advantage by reducing computational costs and improving efficiency compared to training from scratch, leading to advancements in several fields including medical image analysis and autonomous driving.
Papers
October 4, 2024
June 7, 2024
May 20, 2024
January 4, 2024
August 4, 2023
March 16, 2023