Text to Motion Generation
Text-to-motion generation aims to create realistic human or camera movements from textual descriptions, impacting fields like animation and robotics. Current research heavily utilizes diffusion models and transformers, often incorporating techniques like bidirectional autoregression, local action guidance, and hierarchical diffusion to improve motion coherence, detail, and controllability, addressing challenges such as long-sequence generation and multi-person interactions. This rapidly advancing field is driven by the need for more efficient and versatile methods, particularly those capable of handling open-vocabulary prompts and generating highly detailed, physically plausible motions.
Papers
LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning
Zhe Li, Weihao Yuan, Yisheng He, Lingteng Qiu, Shenhao Zhu, Xiaodong Gu, Weichao Shen, Yuan Dong, Zilong Dong, Laurence T. Yang
MotionRL: Align Text-to-Motion Generation to Human Preferences with Multi-Reward Reinforcement Learning
Xiaoyang Liu, Yunyao Mao, Wengang Zhou, Houqiang Li
FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis
Ke Fan, Junshu Tang, Weijian Cao, Ran Yi, Moran Li, Jingyu Gong, Jiangning Zhang, Yabiao Wang, Chengjie Wang, Lizhuang Ma
Learning Generalizable Human Motion Generator with Reinforcement Learning
Yunyao Mao, Xiaoyang Liu, Wengang Zhou, Zhenbo Lu, Houqiang Li