Text to Motion

Text-to-motion research aims to generate realistic human or animal movements from textual descriptions, bridging the gap between natural language and complex spatiotemporal data. Current efforts focus on improving the controllability and robustness of motion generation, often employing diffusion models, transformers, and large language models to achieve fine-grained control over pose and action, and to handle diverse and ambiguous textual inputs. This field is significant for its potential applications in animation, robotics, virtual reality, and accessibility technologies, driving advancements in both generative modeling and cross-modal understanding. Furthermore, research is actively addressing challenges such as data scarcity, generalization to unseen actions, and the development of robust and safe systems.

Papers