Text to Motion
Text-to-motion research aims to generate realistic human or animal movements from textual descriptions, bridging the gap between natural language and complex spatiotemporal data. Current efforts focus on improving the controllability and robustness of motion generation, often employing diffusion models, transformers, and large language models to achieve fine-grained control over pose and action, and to handle diverse and ambiguous textual inputs. This field is significant for its potential applications in animation, robotics, virtual reality, and accessibility technologies, driving advancements in both generative modeling and cross-modal understanding. Furthermore, research is actively addressing challenges such as data scarcity, generalization to unseen actions, and the development of robust and safe systems.
Papers
CoMo: Controllable Motion Generation through Language Guided Pose Code Editing
Yiming Huang, Weilin Wan, Yue Yang, Chris Callison-Burch, Mark Yatskar, Lingjie Liu
Motion Generation from Fine-grained Textual Descriptions
Kunhang Li, Yansong Feng
LaserHuman: Language-guided Scene-aware Human Motion Generation in Free Environment
Peishan Cong, Ziyi Wang, Zhiyang Dou, Yiming Ren, Wei Yin, Kai Cheng, Yujing Sun, Xiaoxiao Long, Xinge Zhu, Yuexin Ma