Vision Language Task
Vision-language tasks aim to bridge the gap between visual and textual information, enabling machines to understand and generate descriptions, answer questions, and perform complex reasoning based on both image and text data. Current research focuses on improving model efficiency and robustness, particularly through innovative pre-training strategies, parameter-efficient fine-tuning methods, and the development of more interpretable architectures like transformers and multimodal large language models (MLLMs). These advancements are significant for applications in assistive technologies, improving the accessibility and usability of AI systems across various domains, and furthering our understanding of multimodal learning.
Papers
August 19, 2023
August 17, 2023
July 28, 2023
July 15, 2023
July 13, 2023
July 6, 2023
July 3, 2023
June 29, 2023
June 26, 2023
June 13, 2023
June 2, 2023
May 24, 2023
May 23, 2023
May 9, 2023
April 27, 2023
April 6, 2023
April 3, 2023
March 29, 2023