Large Scale Vision Language Model
Large-scale vision-language models (LVLMs) integrate image and text processing capabilities, aiming to improve multimodal understanding and generation. Current research focuses on enhancing LVLMs' cross-lingual abilities, adapting them to open-world scenarios (like single-image test-time adaptation), and improving their performance on nuanced tasks such as artwork explanation and image review. These advancements are significant because they enable more robust and versatile applications in areas like image retrieval, semantic segmentation, and visual question answering, particularly in domains with limited labeled data.
Papers
September 3, 2024
June 1, 2024
February 29, 2024
February 19, 2024
December 12, 2023
October 3, 2023