Define SLOs and incident response
sresubagentsetup L3★0
artemislab/kerios ↗What it does
Define SLOs, design monitoring, alerting, and incident response runbooks
Best for
Organizations that need reliability engineering across services with error budgets and blameless incident response culture.
Inputs
- · service architecture
- · incident history
- · business impact
- · golden signals targets
Outputs
- · SLO definitions
- · monitoring configs
- · alerting rules
- · operational runbooks
- · postmortem templates
Requires
- · prometheus/grafana
- · alerting system
- · incident management platform
Preconditions
service has measurable metrics, team has on-call rotation, postmortem process defined
Failure modes
- · SLOs without measurement
- · alert fatigue (noisy alerts)
- · missing runbooks
- · no error budget tracking
Trust signals
- · uses golden signals framework
- · includes error budget burn rate alerts
- · defines rollback/graceful degradation strategies