Design statistically valid experiments
experiment-designerskillsetup L2★327
mohitagw15856/pm-claude-skills ↗What it does
Design rigorous experiments from hypotheses and interpret results with statistical and practical significance
Best for
Any decision backed by A/B test; forces pre-commitment to success criteria, prevents peeking and goalpost movement, separates statistical from practical significance.
Inputs
- · Hypothesis (change, metric, expected lift, reason)
- · Baseline metric value and current sample size
- · Minimum detectable effect (MDE) and acceptable sample size
Outputs
- · Experiment design (sample size, run duration, pre-defined success criteria)
- · Results interpretation (statistical + practical significance, recommendation: ship/iterate/kill/follow-up)
Requires
- · Optional: A/B testing tool (Amplitude, LaunchDarkly, VWO, Optimizely)
Preconditions
- · Hypothesis stated as 'if we [change], we expect [metric] to [move by X%]'
- · Baseline metric and sample size available
- · Control and variant clearly defined
Failure modes
- · Test stopped early (peeking problem — multiple looks inflate p-value)
- · Success criteria moved after test runs (HARKing — hypothesizing after results known)
- · Practical vs. statistical significance conflated (2% lift is statistically significant but not actionable)
- · Sample ratio mismatch (assignment broken, control and variant samples imbalanced)
Trust signals
- · Pre-defined success threshold before test runs (no moving goalposts)
- · Design risk section flags: novelty effects, seasonal confounds, multiple testing, network effects, sample ratio mismatch
- · Interpretation separates statistical significance (p < 0.05) from practical significance (is the lift worth shipping)
- · Peeking check explicit: 'confirm test was not stopped early'
- · Recommendation logic: Ship / Iterate / Kill / Follow-up with rationale