Design statistically valid A/B tests

ab-testingskillsetup L2★29

What it does

Design and validate statistically sound A/B tests

Best for

Product and growth teams validating feature changes or messaging hypotheses before rolling out to 100% of users.

Inputs

Outputs

Requires

Preconditions

Failure modes

· Underpowered tests (too-small sample size) detect false positives or miss real effects
· Peeking at results early → false positives; breaks statistical validity
· Testing multiple variables at once → cannot isolate causation
· Novelty effects (change-driven uplift) fade; long-running tests may invalidate early conclusions

Trust signals

· Hypothesis framework (Because [observation], we believe [change] will cause [outcome] for [audience]).
· Weak vs strong hypothesis examples with specificity
· Four test types (A/B split, A/B/n, MVT, split URL) with trade-offs
· Sample size quick reference table (by baseline rate and lift target)
· Duration formula and references to Evan Miller + Optimizely calculators
· Metrics selection guidance (primary tied to hypothesis, secondary for interpretation, guardrail for safety)