Caption Editing
Caption editing focuses on improving the accuracy, fluency, and informativeness of image and video captions, primarily by leveraging large vision-language models (LVLMs) and diffusion mechanisms. Current research emphasizes mitigating hallucinations (incorrect details in generated captions), enhancing generalization capabilities across diverse datasets, and developing explainable editing methods that mimic human-like revisions through explicit edit operations. These advancements are significant for improving the quality and reliability of multimodal data, impacting applications such as image retrieval, visual question answering, and accessible multimedia content creation.
Papers
October 29, 2024
May 27, 2024
December 4, 2023
November 25, 2023
October 23, 2023
August 25, 2023
June 15, 2023
May 20, 2023
May 5, 2023