Scene Generation
Scene generation focuses on automatically creating realistic and diverse visual scenes, primarily for applications in robotics, gaming, and autonomous driving simulation. Current research emphasizes generating scenes from various inputs, including text descriptions, 2D images (aerial or ground views), sketches, and even LiDAR data, leveraging model architectures like diffusion models, GANs, and transformers, often integrated with large language models for enhanced control and semantic understanding. This field is significant because high-quality, controllable synthetic scenes are crucial for training and evaluating AI systems, particularly in safety-critical domains, and for creating immersive virtual environments.
Papers
OccScene: Semantic Occupancy-based Cross-task Mutual Learning for 3D Scene Generation
Bohan Li, Xin Jin, Jianan Wang, Yukai Shi, Yasheng Sun, Xiaofeng Wang, Zhuang Ma, Baao Xie, Chao Ma, Xiaokang Yang, Wenjun Zeng
DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes
Jinxiu Liu, Shaoheng Lin, Yinxiao Li, Ming-Hsuan Yang
MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation
Zehuan Huang, Yuan-Chen Guo, Xingqiao An, Yunhan Yang, Yangguang Li, Zi-Xin Zou, Ding Liang, Xihui Liu, Yan-Pei Cao, Lu Sheng
Seeing Beyond Views: Multi-View Driving Scene Video Generation with Holistic Attention
Hannan Lu, Xiaohe Wu, Shudong Wang, Xiameng Qin, Xinyu Zhang, Junyu Han, Wangmeng Zuo, Ji Tao