Action Representation
Action representation in computer vision and robotics focuses on creating effective computational models of actions for tasks like action recognition, anticipation, and robot control. Current research emphasizes learning robust and generalizable action representations using deep learning architectures such as transformers and convolutional neural networks, often incorporating contrastive learning and self-supervised techniques to improve efficiency and reduce reliance on large labeled datasets. These advancements are driving progress in various fields, including human-robot interaction, video analysis, and autonomous systems, by enabling more accurate and efficient processing of visual and sensor data. The development of agent-agnostic representations further enhances the potential for generalization across different robotic platforms and tasks.
Papers
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
Songming Liu, Lingxuan Wu, Bangguo Li, Hengkai Tan, Huayu Chen, Zhengyi Wang, Ke Xu, Hang Su, Jun Zhu
Imitation Learning with Limited Actions via Diffusion Planners and Deep Koopman Controllers
Jianxin Bi, Kelvin Lim, Kaiqi Chen, Yifei Huang, Harold Soh