cyberneticlibrary

Design SLOs and incident response

reliability-engineering-cloudskillsetup L164
Tibsfox/gsd-skill-creator
What it does

Design SLO-driven observability, SRE runbooks, and blameless postmortems

Best for

Building infrastructure that humans can reliably operate and improve with data.

Inputs
  • · Service metrics, SLO targets, incident logs, root-cause analyses
Outputs
  • · Observability dashboards, runbooks, postmortem templates, SLI→SLO mappings
Preconditions
  • · Distinguish SLO from SLA
  • · Know Four Golden Signals
  • · Understand error budgets
Failure modes
  • · SLO too loose (doesn't drive action)
  • · Blameful postmortems (culture rot)
  • · Alert fatigue from bad thresholds
Trust signals
  • · Tibsfox, stable 2026-04-12
  • · Covers SLO crafting, blameless postmortem format, error-budget exhaustion policy