MLLM Agent
MLLM agents are AI systems combining large language models with multimodal capabilities (e.g., image processing) to perform complex tasks, particularly in interacting with real-world environments like mobile devices or GUIs. Current research focuses on improving agent navigation and decision-making through multi-agent architectures and enhanced cognitive abilities, addressing challenges such as long sequences and error correction. This work is significant because it pushes the boundaries of AI's ability to understand and interact with dynamic, real-world information, with implications for applications ranging from automated assistance to improved security in complex AI systems.
Papers
November 3, 2024
June 3, 2024
April 8, 2024
February 20, 2024
February 19, 2024
January 19, 2024