cyberneticlibrary

Define service reliability and reduce toil

sre-patternsskillsetup L264
Tibsfox/gsd-skill-creator
What it does

Implement SLOs, error budgets, and toil reduction

Best for

When service reliability is ad-hoc and alert fatigue prevents visibility into true availability.

Inputs
  • · service metrics
  • · SLO targets
Outputs
  • · SLO definitions
  • · alert rules
  • · toil audit
Requires
  • · Prometheus
  • · PagerDuty
Preconditions
  • · Observability data
  • · team alignment on SLOs
Failure modes
  • · Unrealistic SLOs
  • · alert fatigue
  • · toil underestimated
Trust signals
  • · SLI/SLO/SLA distinction
  • · error budget burn rate
  • · toil taxonomy