Text Guided Editing

Text-guided image and video editing leverages the power of large language models and diffusion models to manipulate visual content based on textual instructions. Current research focuses on improving the fidelity and precision of edits across various modalities, including 2D images, 3D scenes (represented by NeRFs and Gaussian splatting), and videos, often employing techniques like depth-aware conditioning and latent space manipulation to enhance consistency and efficiency. This field is significant for its potential to democratize content creation, enabling users with limited expertise to perform complex edits, and for its contributions to advancing both generative models and human-computer interaction.

Papers