Visual Grounding
Visual grounding is the task of connecting natural language descriptions to corresponding regions within an image or 3D scene. Current research focuses on improving the accuracy and efficiency of visual grounding models, often employing transformer-based architectures and leveraging large multimodal language models (MLLMs) for enhanced feature fusion and reasoning capabilities. This field is crucial for advancing embodied AI, enabling robots and other agents to understand and interact with the world through natural language, and has significant implications for applications such as robotic manipulation, visual question answering, and medical image analysis.
Papers
October 23, 2022
October 22, 2022
October 11, 2022
October 10, 2022
October 7, 2022
October 3, 2022
October 1, 2022
September 29, 2022
September 28, 2022
September 17, 2022
September 8, 2022
August 29, 2022
July 27, 2022
July 21, 2022
July 5, 2022
June 30, 2022
June 21, 2022