Benchmark skill effectiveness with controlled variables

skill-arenaskillsetup L2★0

What it does

Benchmark AI agent skills with controlled-variable A/B testing

Best for

A/B testing AI agent skills or deck configurations with controlled variables, native judge scoring, and parallel isolated execution.

Inputs

Outputs

Requires

Preconditions

Failure modes

Trust signals

· Agent-orchestrated by default (parallel subagents), cross-player mode via Bun.spawn
· Isolated workdirs with deck link + skill preparation
· Native judge scoring (not external service)