Audit skill impact with paired testing
skill-counterfactual-auditskillsetup L3★64
Tibsfox/gsd-skill-creator ↗What it does
Audit skill behavior via paired probe with and without skill loaded
Best for
Detect when a skill changes behavior below pass-rate threshold—surface anchoring, template copy, excess planning, off-task drift.
Inputs
- · probe_task_bank
- · skill_name
- · task_descriptions with phases
Outputs
- · SIP report markdown
- · phase_comparison table
- · retire/refine/keep recommendation
Preconditions
- · Skill has been active ≥3 sessions
- · Probe-task bank curated (3-5 tasks)
- · Phase decomposition rules defined
Failure modes
- · Workflow guides (loop, schedule) trivially show task-recovery
- · No baseline if skill just created
Trust signals
- · Based on arxiv 2605.11946v1 (CTA)
- · Detects 522 behavioral changes while pass-rate moves +0.3%