Web Agent
Web agents are autonomous software programs designed to interact with and perform tasks on websites, aiming to automate complex online workflows. Current research focuses on improving their accuracy and robustness through techniques like hierarchical architectures, multimodal validation, and reinforcement learning, often employing large language models (LLMs) and incorporating visual information alongside text. These advancements are crucial for enhancing productivity in various domains, from streamlining business processes to creating more effective digital assistants, but challenges remain in areas such as reliable web navigation, handling dynamic web content, and ensuring agent security and privacy.
Papers
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents
Ke Yang, Yao Liu, Sapana Chaudhary, Rasool Fakoor, Pratik Chaudhari, George Karypis, Huzefa Rangwala
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation
Hyungjoo Chae, Namyoung Kim, Kai Tzu-iunn Ong, Minju Gwak, Gwanwoo Song, Jihoon Kim, Sunghwan Kim, Dongha Lee, Jinyoung Yeo