Audio Description
Audio description (AD) research focuses on automatically generating textual or spoken descriptions of visual content in videos, primarily to enhance accessibility for visually impaired individuals. Current research emphasizes leveraging large language models (LLMs) and vision-language models (VLMs) in conjunction with various architectures, including transformers and convolutional neural networks, to generate contextually rich and character-aware descriptions from video data. This work is significant because it addresses a critical need for inclusive media access, and advancements in AD technology have the potential to improve the quality of life for many while also advancing multimodal learning and natural language generation.
Papers
October 11, 2024
October 4, 2024
September 26, 2024
September 19, 2024
July 22, 2024
June 7, 2024
May 2, 2024
April 22, 2024
March 19, 2024
February 29, 2024
February 3, 2024
November 29, 2023
November 1, 2023
October 10, 2023
March 29, 2023
February 14, 2023
December 1, 2021