Text to Image Model
Text-to-image models generate images from textual descriptions, aiming to achieve high fidelity, creativity, and safety. Current research focuses on improving image-text alignment, mitigating biases and safety issues (like generating harmful content or being vulnerable to jailbreaks), and enhancing model generalizability and efficiency through techniques such as diffusion models, fine-tuning strategies, and vector quantization. These advancements have significant implications for various fields, including art, design, and medical imaging, but also raise ethical concerns regarding bias, safety, and potential misuse requiring ongoing investigation and development of robust mitigation strategies.
Papers
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, Mohammad Norouzi
GR-GAN: Gradual Refinement Text-to-image Generation
Bo Yang, Fangxiang Feng, Xiaojie Wang