Vision Language Action
Vision-Language-Action (VLA) models integrate computer vision, natural language processing, and robotics to enable robots to understand and execute complex tasks instructed via natural language commands and visual input. Current research focuses on improving the robustness and generalization of these models, often employing transformer-based architectures and techniques like chain-of-thought prompting to enhance reasoning capabilities, as well as developing efficient training methods and evaluation platforms. This field is significant for advancing embodied AI, with potential applications ranging from surgical assistance and household robotics to autonomous driving and industrial automation.
30papers
Papers - Page 2
March 4, 2025
UAV-VLPA*: A Vision-Language-Path-Action System for Optimal Route Generation on a Large Scales
Oleg Sautenkov, Aibek Akhmetkazy, Yasheerah Yaqoot, Muhammad Ahsan Mustafa, Grik Tadevosyan, Artem Lykov, Dzmitry TsetserukouSkoltechAccelerating Vision-Language-Action Model Integrated with Action Chunking via Parallel Decoding
Wenxuan Song, Jiayi Chen, Pengxiang Ding, Han Zhao, Wei Zhao, Zhide Zhong, Zongyuan Ge, Jun Ma, Haoang LiThe Hong Kong University of Science and Technology (Guangzhou)●Westlake University●Zhejiang University●Monash University
February 27, 2025
February 14, 2025
February 13, 2025
February 8, 2025
HAMSTER: Hierarchical Action Models For Open-World Robot Manipulation
Yi Li, Yuquan Deng, Jesse Zhang, Joel Jang, Marius Memme, Raymond Yu, Caelan Reed Garrett, Fabio Ramos, Dieter Fox, Anqi Li, Abhishek Gupta+1ConRFT: A Reinforced Fine-tuning Method for VLA Models via Consistency Policy
Yuhui Chen, Shuai Tian, Shugao Liu, Yingting Zhou, Haoran Li, Dongbin Zhao