Instruction Guided Scene
Instruction-guided scene generation focuses on creating or modifying 3D scenes based on natural language instructions, aiming for high fidelity and consistency. Current research emphasizes improving the 3D awareness and consistency of 2D diffusion models, often by incorporating 3D context, structured noise, and self-supervised training, or by treating 4D scenes (including time) as pseudo-3D representations. This field is driven by the need for large-scale datasets with dense grounding between language and 3D scenes to train robust models and reduce hallucinations, leading to advancements in embodied AI and applications in virtual and augmented reality. The development of more controllable and realistic scene generation techniques has significant implications for various fields, including robotics, game development, and architectural design.