Multimodal Instruction
Multimodal instruction focuses on enabling artificial intelligence systems to understand and respond to instructions encompassing multiple modalities, such as text, images, audio, and even 3D data. Current research emphasizes developing models that can effectively align these different modalities, often employing techniques like multimodal encoders, large language models (LLMs), and parameter-efficient fine-tuning methods such as LoRA. This field is significant because it paves the way for more natural and versatile human-computer interaction, with applications ranging from robotic control and augmented reality to improved accessibility for diverse user populations.
Papers
November 29, 2023
November 3, 2023
October 18, 2023
October 4, 2023
October 1, 2023
September 27, 2023
September 14, 2023
August 30, 2023
August 8, 2023
July 3, 2023
May 25, 2023
May 18, 2023
April 28, 2023