Instruction Based Editing
Instruction-based image editing focuses on modifying images based on natural language instructions, aiming for more intuitive and user-friendly image manipulation than traditional methods. Current research emphasizes improving the accuracy and control of edits, particularly for complex tasks involving actions, reasoning about physical dynamics, and multi-attribute changes, often leveraging diffusion models and multimodal large language models. This field is significant because it bridges the gap between human intent and image manipulation, with potential applications in various fields including creative design, content creation, and computer-aided design.
Papers
SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models
Yuzhou Huang, Liangbin Xie, Xintao Wang, Ziyang Yuan, Xiaodong Cun, Yixiao Ge, Jiantao Zhou, Chao Dong, Rui Huang, Ruimao Zhang, Ying Shan
InstructAny2Pix: Flexible Visual Editing via Multimodal Instruction Following
Shufan Li, Harkanwar Singh, Aditya Grover