Multimodal Prompting
Multimodal prompting leverages the power of multiple data modalities (e.g., text, images, video) to enhance the performance of machine learning models, particularly in few-shot and continual learning scenarios. Current research focuses on developing effective prompting strategies, often employing transformer-based architectures and contrastive learning methods, to guide models towards desired outputs by integrating modality-specific and cross-modal information. This approach is proving valuable across diverse applications, including medical image analysis, visual reasoning, and robotic control, by improving model accuracy and efficiency while reducing the need for extensive training data.
Papers
December 1, 2024
July 22, 2024
June 17, 2024
April 22, 2024
April 13, 2024
March 28, 2024
March 22, 2024
March 1, 2024
December 26, 2023
December 22, 2023
December 4, 2023
October 28, 2023
September 16, 2023
August 22, 2023
June 15, 2023
May 10, 2023
April 6, 2023
March 6, 2023
April 1, 2022