Digital Action

Digital action research focuses on enabling computers to understand and perform actions in response to multimodal inputs, bridging the gap between the physical and digital worlds. Current research emphasizes developing models that integrate large language models (LLMs) with other modalities like vision and audio, often employing transformer architectures and reinforcement learning to improve interaction and decision-making in diverse contexts, such as human-robot collaboration and multimodal information processing. This work is significant for advancing artificial intelligence capabilities in areas like personalized interfaces, automated task completion, and improved human-computer interaction, ultimately impacting various fields from healthcare to robotics.

Papers