Vision and Language
Vision and language research aims to create computational models that understand and generate both visual and textual information, bridging the gap between human perception and language processing. Current research focuses on improving the accuracy and efficiency of vision-and-language models (VLMs), exploring architectures like transformers and MLPs, and addressing challenges such as multimodal grounding, bias mitigation, and cross-lingual capabilities. These advancements are significant for various applications, including image captioning, visual question answering, and more generally, enabling more robust and nuanced human-computer interaction.
Papers
July 25, 2024
March 20, 2024
December 7, 2023
December 1, 2023
November 27, 2023
November 25, 2023
October 25, 2023
October 19, 2023
June 26, 2023
May 12, 2023
March 21, 2023
October 21, 2022
July 25, 2022
December 8, 2021
November 3, 2021