Design statistically valid A/B tests
ab-testingskillsetup L1★28,686
coreyhaines31/marketingskills ↗Causal-lift measurements
ab-experimentation0pp vs no-skill baselinewith-skill 100% · baseline 100%
Measured by running the task with and without this artifact, K=5, graded by deterministic checks — no LLM judging.
What it does
You are an expert in experimentation and A/B testing.
Best for
Validating product hypotheses where statistical rigor and p-value confidence matter more than speed.
Inputs
- · Baseline conversion metric and traffic volume
- · Hypothesis with expected lift percentage
- · Single variable to test (headline, design, CTA, copy)
Outputs
- · Pre-calculated sample size for statistical significance
- · Hypothesis statement aligned to test framework
- · Metrics dashboard (primary, secondary, guardrail)
Preconditions
- · Adequate traffic volume (varies by baseline and lift target)
- · Single-variable isolation
Failure modes
- · Peeking at results early invalidates statistical validity
- · Multiple simultaneous tests introduce interaction effects
- · Guardrail metric decline can be masked by primary metric wins
Trust signals
- · Named examples
- · Quantified metrics
- · Structured frameworks
- · Named methodologies