Dense Captioning
Dense captioning aims to generate detailed, localized descriptions for multiple regions within an image or video, going beyond simple object labeling. Current research emphasizes developing unified frameworks that integrate object detection and caption generation, often employing transformer-based architectures and leveraging large-scale vision-language models for improved performance. This field is crucial for applications ranging from infrastructure assessment (e.g., pavement condition analysis) to assistive technologies for the visually impaired, and advancements are driving progress in both open-world object detection and visual question answering.
Papers
November 12, 2024
August 7, 2024
July 9, 2024
April 17, 2024
March 18, 2024
November 5, 2023
March 4, 2023
January 6, 2023
December 1, 2022
July 24, 2022
March 10, 2022
February 10, 2022