Human Instruction
Human instruction following in AI focuses on developing models capable of accurately and reliably executing complex tasks based on diverse instructions, encompassing text, images, and audio. Current research emphasizes improving model alignment through techniques like instruction tuning and response tuning, often utilizing large language models (LLMs) and diffusion transformers, and exploring novel evaluation metrics for multi-modal, multi-turn interactions. This field is crucial for advancing human-computer interaction, enabling more intuitive and effective collaboration between humans and AI systems across various domains, from robotics and manufacturing to healthcare and education.
Papers
Hindsight Planner: A Closed-Loop Few-Shot Planner for Embodied Instruction Following
Yuxiao Yang, Shenao Zhang, Zhihan Liu, Huaxiu Yao, Zhaoran Wang
Find the Intention of Instruction: Comprehensive Evaluation of Instruction Understanding for Large Language Models
Hyeonseok Moon, Jaehyung Seo, Seungyoon Lee, Chanjun Park, Heuiseok Lim
Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback
Jiaming Ji, Jiayi Zhou, Hantao Lou, Boyuan Chen, Donghai Hong, Xuyao Wang, Wenqi Chen, Kaile Wang, Rui Pan, Jiahao Li, Mohan Wang, Josef Dai, Tianyi Qiu, Hua Xu, Dong Li, Weipeng Chen, Jun Song, Bo Zheng, Yaodong Yang
Autoware.Flex: Human-Instructed Dynamically Reconfigurable Autonomous Driving Systems
Ziwei Song, Mingsong Lv, Tianchi Ren, Chun Jason Xue, Jen-Ming Wu, Nan Guan
InstructOCR: Instruction Boosting Scene Text Spotting
Chen Duan, Qianyi Jiang, Pei Fu, Jiamin Chen, Shengxi Li, Zining Wang, Shan Guo, Junfeng Luo
MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
Shengbang Tong, David Fan, Jiachen Zhu, Yunyang Xiong, Xinlei Chen, Koustuv Sinha, Michael Rabbat, Yann LeCun, Saining Xie, Zhuang Liu
DragScene: Interactive 3D Scene Editing with Single-view Drag Instructions
Chenghao Gu, Zhenzhe Li, Zhengqi Zhang, Yunpeng Bai, Shuzhao Xie, Zhi Wang