Compare specialist agents head-to-head
«family»-agent-headtoheadworkflowsetup L3★2
samjmarshall/rekurve ↗What it does
Compare agents within a family on tasks
Best for
Benchmarking agent variants when you need head-to-head evaluation on a fixed task set.
Inputs
- · family (agent variants)
- · task_set (benchmark tasks)
Outputs
- · scores per agent per task; ranked results
Requires
- · parallel execution
- · scoring agent
Preconditions
Agents must be deployable; task set must be fixed and reproducible.
Failure modes
Agent fails to complete task; scoring inconsistent; task set too small for statistical significance.