ab-experimentation
Designing, running, analyzing, and interpreting A/B tests with statistical rigor.
no-skill baseline: 100% — anything below it makes the model worse.
⚖ Measured verdict: the base model already handles this capability well — every tested candidate degraded output. Recommended: no artifact at all for this step.
1WORKS 58★4,8902WORKS 58★333WORKS 58★294WORKS 57★11,2395WORKS 57★3276WORKS 56★17,4647WORKS 55★4,9578WORKS 55★17,4649WORKS 54★010WORKS 54★2411WORKS 52★012WORKS 51★17,46413WORKS 50★28,68614WORKS 50★11,23915WORKS 50★3916WORKS 40★6417WORKS 8★2,362
ads-testskilldefault
Ad teams avoiding "winner calling" errors and ensuring statistical rigor in creative testing.
ab-testing-frameworkskill
Quantifying impact of CTA, copy, or design changes before full rollout when conversion volume is sufficient for statistical power
ab-testingskill
Product and growth teams validating feature changes or messaging hypotheses before rolling out to 100% of users.
ab-test-analysisskill
Data-driven product decisions when A/B test results need validation against statistical rigor and guardrail constraints before shipping.
experiment-designerskill
Any decision backed by A/B test; forces pre-commitment to success criteria, prevents peeking and goalpost movement, separates statistical from practical significance.
statistical-analystskill
Rigorously validating A/B test results when business decisions depend on statistical confidence.
lean-ux-canvasskill
Running low-cost, high-speed validation before committing to expensive builds.
statistical-analystplugin
Validating A/B experiment results with rigorous statistics instead of eyeballing conversion rates or running tests that are too small
ab-test-setupskill
Validating product hypotheses with statistical rigor and measuring impact on business metrics.
ab-testingskill
When designing statistically valid experiments with pre-committed sample sizes and rigor.
ab-methodskill
When you need statistical validation of a change before rollout rather than anecdotal observation.
experiment-designerskill
Product teams running defensible experiments with clear success criteria and statistical stopping rules.
ab-testingskill
Validating product hypotheses where statistical rigor and p-value confidence matter more than speed.
analyze-testcommand
Validating A/B test results with statistical rigor before deciding to ship a variant.
surge-experimentskill
Growth PMs designing experiments where you need clarity on mechanism, sample size, and decision criteria before engineering starts building.
experimental-design-dsskill
When designing experiments, planning a/b tests, calculating sample sizes, or reasoning about causation from data.
growth-engineskill
Teams running 5+ simultaneous A/B tests across channels who need automated winner detection with statistical rigor