Text to Image Model
Text-to-image models generate images from textual descriptions, aiming to achieve high fidelity, creativity, and safety. Current research focuses on improving image-text alignment, mitigating biases and safety issues (like generating harmful content or being vulnerable to jailbreaks), and enhancing model generalizability and efficiency through techniques such as diffusion models, fine-tuning strategies, and vector quantization. These advancements have significant implications for various fields, including art, design, and medical imaging, but also raise ethical concerns regarding bias, safety, and potential misuse requiring ongoing investigation and development of robust mitigation strategies.
Papers
IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation
Luke Melas-Kyriazi, Iro Laina, Christian Rupprecht, Natalia Neverova, Andrea Vedaldi, Oran Gafni, Filippos Kokkinos
Learning Continuous 3D Words for Text-to-Image Generation
Ta-Ying Cheng, Matheus Gadelha, Thibault Groueix, Matthew Fisher, Radomir Mech, Andrew Markham, Niki Trigoni