Katalog Plus
Bibliothek der Frankfurt UAS
Bald neuer Katalog: sichern Sie sich schon vorab Ihre persönlichen Merklisten im Nutzerkonto: Anleitung.
Dieses Ergebnis aus arXiv kann Gästen nicht angezeigt werden.  Login für vollen Zugriff.

WorldGUI: An Interactive Benchmark for Desktop GUI Automation from Any Starting Point

Title: WorldGUI: An Interactive Benchmark for Desktop GUI Automation from Any Starting Point
Authors: Zhao, Henry Hengyuan; Yang, Kaiming; Yu, Wendi; Gao, Difei; Shou, Mike Zheng
Publication Year: 2025
Subject Terms: Artificial Intelligence; Multiagent Systems
Description: Recent progress in GUI agents has substantially improved visual grounding, yet robust planning remains challenging, particularly when the environment deviates from a canonical initial state. In real applications, users often invoke assistance mid-workflow, where software may be partially configured, steps may have been executed in different orders, or the interface may differ from its default setup. Such task-state variability is pervasive but insufficiently evaluated in existing GUI benchmarks. To address this gap, we introduce WorldGUI, a benchmark covering ten widely used desktop and web applications with tasks instantiated under diverse, systematically constructed initial states. These variations capture realistic human-computer interaction settings and enable diagnostic evaluation of an agent's ability to recover, adapt plans, and handle non-default contexts. We further present WorldGUI-Agent, a simple and model-agnostic framework that organizes planning and execution around three critique stages, improving reliability in dynamic environments. Experiments demonstrate that state-of-the-art GUI agents exhibit substantial performance degradation under non-default initial conditions, revealing limited robustness and fragile planning behaviors. Our benchmark and framework provide a foundation for developing more adaptable and reliable GUI agents. The code and data are available at https://github.com/showlab/WorldGUI.; Technique Report
Document Type: Working Paper
Access URL: http://arxiv.org/abs/2502.08047
Accession Number: edsarx.2502.08047
Database: arXiv