Plan resilience testing and game days

chaos-engineeringskillsetup L264
Tibsfox/gsd-skill-creator
What it does

Inject failures systematically to test resilience

Best for

Validating infrastructure resilience before production relies on a service.

Inputs
  • · steady-state metrics
  • · hypothesis
  • · failure scenarios
Outputs
  • · experiment report
  • · chaos metrics
  • · findings
Requires
  • · Prometheus
  • · LitmusChaos
  • · Kubernetes
Preconditions
  • · Production access
  • · baseline metrics
  • · kill switches
Failure modes
  • · Uncontrolled blast radius
  • · cascading failures
  • · incomplete rollback
Trust signals
  • · CNCF certification (Litmus)
  • · steady-state hypothesis template
  • · abort conditions