Design SLOs and incident response
reliability-engineering-cloudskillsetup L1★64
Tibsfox/gsd-skill-creator ↗What it does
Design SLO-driven observability, SRE runbooks, and blameless postmortems
Best for
Building infrastructure that humans can reliably operate and improve with data.
Inputs
- · Service metrics, SLO targets, incident logs, root-cause analyses
Outputs
- · Observability dashboards, runbooks, postmortem templates, SLI→SLO mappings
Preconditions
- · Distinguish SLO from SLA
- · Know Four Golden Signals
- · Understand error budgets
Failure modes
- · SLO too loose (doesn't drive action)
- · Blameful postmortems (culture rot)
- · Alert fatigue from bad thresholds
Trust signals
- · Tibsfox, stable 2026-04-12
- · Covers SLO crafting, blameless postmortem format, error-budget exhaustion policy