Human Editing
Human-computer interaction is being revolutionized by advancements in "human editing," encompassing the ability to generate and modify various data modalities (images, audio, text, 3D models) using natural language instructions or other forms of user input. Current research heavily utilizes diffusion models and large language models (LLMs), often integrated within multimodal frameworks, to achieve precise and flexible control over the editing process, addressing challenges like hallucination and ambiguity. This field is significant for its potential to improve accessibility in creative fields, enhance the efficiency of content creation, and advance our understanding of how humans interact with and interpret AI-generated content.
Papers
HyperGANStrument: Instrument Sound Synthesis and Editing with Pitch-Invariant Hypernetworks
Zhe Zhang, Taketo Akama
Uni3D-LLM: Unifying Point Cloud Perception, Generation and Editing with Large Language Models
Dingning Liu, Xiaoshui Huang, Yuenan Hou, Zhihui Wang, Zhenfei Yin, Yongshun Gong, Peng Gao, Wanli Ouyang
FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing
Mingyuan Zhang, Huirong Li, Zhongang Cai, Jiawei Ren, Lei Yang, Ziwei Liu
Tuning-Free Inversion-Enhanced Control for Consistent Image Editing
Xiaoyue Duan, Shuhao Cui, Guoliang Kang, Baochang Zhang, Zhengcong Fei, Mingyuan Fan, Junshi Huang