cyberneticlibrary

Design statistically valid A/B tests

ab-testingskillsetup L1★28,686

coreyhaines31/marketingskills ↗

Causal-lift measurements

ab-experimentation0pp vs no-skill baselinewith-skill 100% · baseline 100%

Measured by running the task with and without this artifact, K=5, graded by deterministic checks — no LLM judging.

What it does

You are an expert in experimentation and A/B testing.

Best for

Validating product hypotheses where statistical rigor and p-value confidence matter more than speed.

Inputs

· Baseline conversion metric and traffic volume
· Hypothesis with expected lift percentage
· Single variable to test (headline, design, CTA, copy)

Outputs

· Pre-calculated sample size for statistical significance
· Hypothesis statement aligned to test framework
· Metrics dashboard (primary, secondary, guardrail)

Preconditions

· Adequate traffic volume (varies by baseline and lift target)
· Single-variable isolation

Failure modes

· Peeking at results early invalidates statistical validity
· Multiple simultaneous tests introduce interaction effects
· Guardrail metric decline can be masked by primary metric wins

Trust signals

· Named examples
· Quantified metrics
· Structured frameworks
· Named methodologies

Capability

ab-experimentation → compare alternatives