Captioning Model
Image captioning models automatically generate textual descriptions of images or videos, aiming to accurately and comprehensively convey visual content. Current research emphasizes improving caption quality through techniques like reinforcement learning, leveraging large language models (LLMs) for contextualization and improved evaluation, and developing models capable of handling diverse data types such as live video streams and compressed video formats. These advancements have significant implications for various applications, including journalism, retail analytics, accessibility tools for the visually impaired, and enhancing the performance of other AI systems that rely on visual understanding.
Papers
November 15, 2022
September 25, 2022
September 17, 2022
September 15, 2022
August 8, 2022
July 7, 2022
July 6, 2022
June 30, 2022
June 7, 2022
May 26, 2022
May 12, 2022
May 9, 2022
March 13, 2022
February 28, 2022
January 22, 2022
December 9, 2021
December 2, 2021