Motion Understanding
Motion understanding in computer vision and robotics aims to accurately interpret and generate human and robot movements from various data sources like video, motion capture, and sensor readings. Current research heavily utilizes large language models (LLMs) and transformer-based architectures, often incorporating multimodal data (video, text, motion capture) to improve accuracy and enable tasks like motion generation, prediction, and question answering. This field is crucial for advancements in human-robot interaction, animation, virtual reality, and healthcare applications, particularly in areas like robot-assisted therapy and motion analysis for medical diagnosis.
Papers
Muscles in Time: Learning to Understand Human Motion by Simulating Muscle Activations
David Schneider, Simon Reiß, Marco Kugler, Alexander Jaus, Kunyu Peng, Susanne Sutschet, M. Saquib Sarfraz, Sven Matthiesen, Rainer Stiefelhagen
Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning
Penghui Ruan, Pichao Wang, Divya Saxena, Jiannong Cao, Yuhui Shi