Captioning Benchmark
Image and video captioning benchmarks are crucial for evaluating the ability of vision-language models to generate accurate and detailed textual descriptions of visual content. Current research focuses on developing more comprehensive benchmarks with longer, more structured captions, improving evaluation metrics to better align with human judgment, and exploring novel model architectures, such as transformer-based models and those incorporating streaming or memory mechanisms, to handle longer videos and generate richer descriptions. These advancements are vital for improving the performance and reliability of multimodal AI systems, with applications ranging from automated content description to assistive technologies.
Papers
November 1, 2024
October 4, 2024
May 29, 2024
April 1, 2024
February 19, 2024
January 10, 2024
December 23, 2023
December 1, 2023
August 2, 2023
July 19, 2023
May 4, 2023
March 29, 2023
March 26, 2023
February 20, 2023
February 8, 2023
December 4, 2022
November 17, 2022
September 12, 2022
July 20, 2022