Digital Action
Digital action research focuses on enabling computers to understand and perform actions in response to multimodal inputs, bridging the gap between the physical and digital worlds. Current research emphasizes developing models that integrate large language models (LLMs) with other modalities like vision and audio, often employing transformer architectures and reinforcement learning to improve interaction and decision-making in diverse contexts, such as human-robot collaboration and multimodal information processing. This work is significant for advancing artificial intelligence capabilities in areas like personalized interfaces, automated task completion, and improved human-computer interaction, ultimately impacting various fields from healthcare to robotics.
Papers
InterAct: Capture and Modelling of Realistic, Expressive and Interactive Activities between Two Persons in Daily Scenarios
Yinghao Huang, Leo Ho, Dafei Qin, Mingyi Shi, Taku Komura
Inquire, Interact, and Integrate: A Proactive Agent Collaborative Framework for Zero-Shot Multimodal Medical Reasoning
Zishan Gu, Fenglin Liu, Changchang Yin, Ping Zhang