Automate desktop app interactions
gui-agentskillsetup L2★33
Fzkuji/GUI-Agent-Harness ↗What it does
Execute GUI tasks with vision and autonomous clicking
Best for
Automating repetitive GUI workflows and desktop testing when no API is available
Inputs
- · Natural language task description
- · Optional VM URL
- · Optional model/provider override
Outputs
- · Task completion status
- · Screenshot evidence of final state
Requires
- · Vision model
- · Browser/desktop control API
- · Optional remote VM HTTP
Preconditions
- · Screen/GUI accessible
- · Target app available
- · Permissions for input automation
Failure modes
- · Max steps exceeded (default 15)
- · UI component not recognized
- · App crash/unresponsive
- · Permission denied for automation
Trust signals
- · Published on GitHub/Hugging Face
- · Supports multiple LLM providers
- · Example commands documented
- · Max-steps safety guardrail included