mPLUG Owl
mPLUG-Owl represents a series of multi-modal large language models (MLLMs) designed to improve understanding and processing of long image sequences and complex visual information. Research focuses on enhancing the collaboration between visual and textual modalities through modular network architectures and novel attention mechanisms, aiming for improved performance on various tasks including image captioning, visual question answering, and video understanding. These advancements contribute to the broader field of artificial intelligence by pushing the boundaries of MLLM capabilities and enabling more robust and versatile applications in areas such as autonomous exploration and information retrieval.
Papers
August 9, 2024
November 7, 2023
April 27, 2023
March 1, 2023
October 7, 2022