Graphical User Interface Automation

Graphical user interface (GUI) automation aims to enable computers to autonomously perform tasks within software applications, boosting human productivity. Current research heavily utilizes large language models (LLMs) and multimodal models, often incorporating reinforcement learning and advanced planning algorithms like dynamic planning, to interpret user instructions and execute complex sequences of actions within dynamic GUI environments. This field is significant because it promises to automate tedious and repetitive tasks across diverse software, from simple mobile apps to professional design tools, but faces challenges in handling complex, visually-centric tasks and achieving high accuracy in diverse settings.

Papers