Compositional Generation

Compositional generation aims to create complex outputs—like images or 3D models—by combining simpler components, addressing the limitations of models that struggle with nuanced instructions or multi-object scenes. Current research focuses on leveraging large language models and diffusion models, often employing training-free methods and techniques like chain-of-thought reasoning or coroutine-based constraints to guide the generation process and improve controllability. This work is significant because it tackles a key challenge in AI—achieving robust generalization and flexible control over complex generative tasks—with implications for various applications, including image and video synthesis, 3D modeling, and program generation.

Papers