Vision Language Action
Vision-Language-Action (VLA) models integrate computer vision, natural language processing, and robotics to enable robots to understand and execute complex tasks instructed via natural language commands and visual input. Current research focuses on improving the robustness and generalization of these models, often employing transformer-based architectures and techniques like chain-of-thought prompting to enhance reasoning capabilities, as well as developing efficient training methods and evaluation platforms. This field is significant for advancing embodied AI, with potential applications ranging from surgical assistance and household robotics to autonomous driving and industrial automation.
Papers
September 5, 2024
August 19, 2024
August 2, 2024
July 11, 2024
July 10, 2024
June 28, 2024
June 27, 2024
June 23, 2024
June 13, 2024
May 23, 2024
May 9, 2024
April 28, 2024
March 14, 2024
December 22, 2023
October 30, 2023
July 28, 2023
February 24, 2023