cyberneticlibrary

Define SLOs and incident response

sresubagentsetup L30
artemislab/kerios
What it does

Define SLOs, design monitoring, alerting, and incident response runbooks

Best for

Organizations that need reliability engineering across services with error budgets and blameless incident response culture.

Inputs
  • · service architecture
  • · incident history
  • · business impact
  • · golden signals targets
Outputs
  • · SLO definitions
  • · monitoring configs
  • · alerting rules
  • · operational runbooks
  • · postmortem templates
Requires
  • · prometheus/grafana
  • · alerting system
  • · incident management platform
Preconditions

service has measurable metrics, team has on-call rotation, postmortem process defined

Failure modes
  • · SLOs without measurement
  • · alert fatigue (noisy alerts)
  • · missing runbooks
  • · no error budget tracking
Trust signals
  • · uses golden signals framework
  • · includes error budget burn rate alerts
  • · defines rollback/graceful degradation strategies