Contrastive Captioners
Contrastive captioning focuses on generating descriptive text that accurately and distinctively represents multimedia data (images, audio, video) by leveraging contrastive learning techniques. Current research emphasizes improving the temporal understanding of audio and video, enhancing multimodal alignment between text and visual/audio features through architectures like transformers and incorporating large language models for improved caption generation and evaluation. This approach leads to more robust and informative captions, with applications ranging from improved image and video retrieval to more effective evaluation of existing captioning models and advancements in areas like reverse engineering.
Papers
September 12, 2024
April 27, 2024
January 4, 2024
November 15, 2023
October 19, 2023
June 15, 2023
May 22, 2023
December 9, 2022
May 12, 2022
May 4, 2022