cyberneticlibrary

Automate desktop app interactions

gui-agentskillsetup L233
Fzkuji/GUI-Agent-Harness
What it does

Execute GUI tasks with vision and autonomous clicking

Best for

Automating repetitive GUI workflows and desktop testing when no API is available

Inputs
  • · Natural language task description
  • · Optional VM URL
  • · Optional model/provider override
Outputs
  • · Task completion status
  • · Screenshot evidence of final state
Requires
  • · Vision model
  • · Browser/desktop control API
  • · Optional remote VM HTTP
Preconditions
  • · Screen/GUI accessible
  • · Target app available
  • · Permissions for input automation
Failure modes
  • · Max steps exceeded (default 15)
  • · UI component not recognized
  • · App crash/unresponsive
  • · Permission denied for automation
Trust signals
  • · Published on GitHub/Hugging Face
  • · Supports multiple LLM providers
  • · Example commands documented
  • · Max-steps safety guardrail included