Human Editing
Human-computer interaction is being revolutionized by advancements in "human editing," encompassing the ability to generate and modify various data modalities (images, audio, text, 3D models) using natural language instructions or other forms of user input. Current research heavily utilizes diffusion models and large language models (LLMs), often integrated within multimodal frameworks, to achieve precise and flexible control over the editing process, addressing challenges like hallucination and ambiguity. This field is significant for its potential to improve accessibility in creative fields, enhance the efficiency of content creation, and advance our understanding of how humans interact with and interpret AI-generated content.
Papers
M&M VTO: Multi-Garment Virtual Try-On and Editing
Luyang Zhu, Yingwei Li, Nan Liu, Hao Peng, Dawei Yang, Ira Kemelmacher-Shlizerman
LLplace: The 3D Indoor Scene Layout Generation and Editing via Large Language Model
Yixuan Yang, Junru Lu, Zixiang Zhao, Zhen Luo, James J. Q. Yu, Victor Sanchez, Feng Zheng