Vision and Language Model
Vision-and-language models (VLMs) aim to integrate visual and textual information, enabling machines to understand and reason about the world in a more human-like way. Current research focuses on improving VLMs' semantic understanding, particularly addressing their sensitivity to lexical variations and compositional reasoning limitations, often through techniques like prompt tuning and knowledge distillation across different model architectures. These advancements are crucial for enhancing applications such as visual question answering, image captioning, and visual navigation, while also raising important considerations regarding bias mitigation and the development of safer, more robust systems.
Papers
October 29, 2024
October 14, 2024
July 25, 2024
June 17, 2024
March 20, 2024
February 14, 2024
January 11, 2024
November 29, 2023
November 27, 2023
October 25, 2023
September 13, 2023
July 21, 2023
June 4, 2023
May 31, 2023
April 6, 2023
March 14, 2023
December 1, 2022
October 21, 2022
August 23, 2022
July 25, 2022