General World Model
General world models aim to create artificial systems capable of simulating and interacting with diverse environments, enabling advanced reasoning and decision-making. Current research focuses on developing multimodal models, often integrating large language models with video generation techniques (e.g., autoregressive transformers, diffusion models), to achieve greater realism and controllability in simulated worlds. These advancements are being evaluated using new benchmarks that assess a model's ability to understand and predict complex real-world dynamics across multiple disciplines. The resulting improvements in world modeling have significant implications for fields like autonomous driving, robotics, and interactive content creation.
Papers
Pandora: Towards General World Model with Natural Language Actions and Video States
Jiannan Xiang, Guangyi Liu, Yi Gu, Qiyue Gao, Yuting Ning, Yuheng Zha, Zeyu Feng, Tianhua Tao, Shibo Hao, Yemin Shi, Zhengzhong Liu, Eric P. Xing, Zhiting Hu
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
Xuehai He, Weixi Feng, Kaizhi Zheng, Yujie Lu, Wanrong Zhu, Jiachen Li, Yue Fan, Jianfeng Wang, Linjie Li, Zhengyuan Yang, Kevin Lin, William Yang Wang, Lijuan Wang, Xin Eric Wang