cyberneticlibrary

Design statistically valid A/B tests

ab-testingskillsetup L229
dirnbauer/webconsulting-skills
What it does

Design and validate statistically sound A/B tests

Best for

Product and growth teams validating feature changes or messaging hypotheses before rolling out to 100% of users.

Inputs
  • · Hypothesis statement with observation, change, expected outcome
  • · Baseline conversion rate
  • · Current traffic volume
  • · Minimum detectable effect (MDE) target
Outputs
  • · Sample size calculation per variant
  • · Test duration estimate
  • · Metrics plan (primary, secondary, guardrail)
Requires
  • · A/B testing platform (Optimizely, VWO, etc.)
  • · Analytics integration (GA4)
Preconditions
  • · Baseline conversion rate known
  • · Traffic sufficient for sample size
  • · Hypothesis is specific (not just 'let's see what happens')
Failure modes
  • · Underpowered tests (too-small sample size) detect false positives or miss real effects
  • · Peeking at results early → false positives; breaks statistical validity
  • · Testing multiple variables at once → cannot isolate causation
  • · Novelty effects (change-driven uplift) fade; long-running tests may invalidate early conclusions
Trust signals
  • · Hypothesis framework (Because [observation], we believe [change] will cause [outcome] for [audience]).
  • · Weak vs strong hypothesis examples with specificity
  • · Four test types (A/B split, A/B/n, MVT, split URL) with trade-offs
  • · Sample size quick reference table (by baseline rate and lift target)
  • · Duration formula and references to Evan Miller + Optimizely calculators
  • · Metrics selection guidance (primary tied to hypothesis, secondary for interpretation, guardrail for safety)