Text to Motion Generation

Text-to-motion generation aims to create realistic human or camera movements from textual descriptions, impacting fields like animation and robotics. Current research heavily utilizes diffusion models and transformers, often incorporating techniques like bidirectional autoregression, local action guidance, and hierarchical diffusion to improve motion coherence, detail, and controllability, addressing challenges such as long-sequence generation and multi-person interactions. This rapidly advancing field is driven by the need for more efficient and versatile methods, particularly those capable of handling open-vocabulary prompts and generating highly detailed, physically plausible motions.

Papers