cyberneticlibrary

Define SLOs and incident response

sresubagentsetup L3★0

artemislab/kerios ↗

What it does

Define SLOs, design monitoring, alerting, and incident response runbooks

Best for

Organizations that need reliability engineering across services with error budgets and blameless incident response culture.

Inputs

· service architecture
· incident history
· business impact
· golden signals targets

Outputs

· SLO definitions
· monitoring configs
· alerting rules
· operational runbooks
· postmortem templates

Requires

· prometheus/grafana
· alerting system
· incident management platform

Preconditions

service has measurable metrics, team has on-call rotation, postmortem process defined

Failure modes

· SLOs without measurement
· alert fatigue (noisy alerts)
· missing runbooks
· no error budget tracking

Trust signals

· uses golden signals framework
· includes error budget burn rate alerts
· defines rollback/graceful degradation strategies