cyberneticlibrary

Design statistically valid A/B tests

ab-testingskillsetup L128,686
coreyhaines31/marketingskills

Causal-lift measurements

ab-experimentation0pp vs no-skill baselinewith-skill 100% · baseline 100%

Measured by running the task with and without this artifact, K=5, graded by deterministic checks — no LLM judging.

What it does

You are an expert in experimentation and A/B testing.

Best for

Validating product hypotheses where statistical rigor and p-value confidence matter more than speed.

Inputs
  • · Baseline conversion metric and traffic volume
  • · Hypothesis with expected lift percentage
  • · Single variable to test (headline, design, CTA, copy)
Outputs
  • · Pre-calculated sample size for statistical significance
  • · Hypothesis statement aligned to test framework
  • · Metrics dashboard (primary, secondary, guardrail)
Preconditions
  • · Adequate traffic volume (varies by baseline and lift target)
  • · Single-variable isolation
Failure modes
  • · Peeking at results early invalidates statistical validity
  • · Multiple simultaneous tests introduce interaction effects
  • · Guardrail metric decline can be masked by primary metric wins
Trust signals
  • · Named examples
  • · Quantified metrics
  • · Structured frameworks
  • · Named methodologies