Captioning Model
Image captioning models automatically generate textual descriptions of images or videos, aiming to accurately and comprehensively convey visual content. Current research emphasizes improving caption quality through techniques like reinforcement learning, leveraging large language models (LLMs) for contextualization and improved evaluation, and developing models capable of handling diverse data types such as live video streams and compressed video formats. These advancements have significant implications for various applications, including journalism, retail analytics, accessibility tools for the visually impaired, and enhancing the performance of other AI systems that rely on visual understanding.
Papers
October 9, 2024
September 19, 2024
August 8, 2024
August 3, 2024
July 10, 2024
June 20, 2024
June 6, 2024
April 2, 2024
April 1, 2024
March 20, 2024
March 4, 2024
February 21, 2024
February 18, 2024
December 23, 2023
December 18, 2023
November 28, 2023
November 21, 2023
October 25, 2023
October 17, 2023
October 4, 2023