Define service reliability and reduce toil
sre-patternsskillsetup L2★64
Tibsfox/gsd-skill-creator ↗What it does
Implement SLOs, error budgets, and toil reduction
Best for
When service reliability is ad-hoc and alert fatigue prevents visibility into true availability.
Inputs
- · service metrics
- · SLO targets
Outputs
- · SLO definitions
- · alert rules
- · toil audit
Requires
- · Prometheus
- · PagerDuty
Preconditions
- · Observability data
- · team alignment on SLOs
Failure modes
- · Unrealistic SLOs
- · alert fatigue
- · toil underestimated
Trust signals
- · SLI/SLO/SLA distinction
- · error budget burn rate
- · toil taxonomy