Visually Grounded
Visually grounded research focuses on developing models that understand and interact with the world by integrating visual and linguistic information. Current research emphasizes efficient model architectures, often leveraging large language models and incorporating techniques like contrastive learning and multimodal alignment to improve performance on tasks such as visually-situated language understanding and cross-modal retrieval. This field is significant for advancing artificial intelligence capabilities in areas like human-computer interaction and low-resource language processing, particularly by enabling more robust and versatile AI agents capable of handling complex real-world scenarios.
Papers
September 3, 2024
June 17, 2024
May 28, 2024
February 27, 2024
February 8, 2024
December 4, 2023
October 8, 2023
June 14, 2023
May 24, 2023
May 15, 2023
March 30, 2023
March 11, 2023
October 10, 2022