Large Scale Text to Image

Large-scale text-to-image generation focuses on creating high-quality, realistic images from textual descriptions, primarily using diffusion models and masked image models. Current research emphasizes improving controllability, addressing issues like adversarial attacks and ensuring consistent image generation across different views or edits, often through techniques like prompt tuning, multi-modal input (audio, existing images), and novel attention mechanisms. This field is significant for its potential applications in various creative industries and scientific visualization, while also raising important ethical considerations regarding the generation of harmful content.

Papers