Large Scale Vision Language
Large-scale vision-language (V-L) models aim to integrate visual and textual information for improved understanding and generation of multimodal data. Current research focuses on adapting these pre-trained models to specific tasks, such as robotic control and anomaly detection, often through techniques like prompt tuning and the insertion of concept-aware adapters to enhance performance and mitigate biases. This field is significant because it enables more robust and versatile AI systems capable of interacting with the world in a more human-like way, with applications ranging from assistive robotics to improved image understanding and generation.
Papers
November 8, 2024
December 12, 2023
November 27, 2023
September 28, 2023
June 28, 2023
May 26, 2023
January 29, 2023
October 14, 2022