cyberneticlibrary

Write incident postmortems that teach

blameless-postmortemskillsetup L164
Tibsfox/gsd-skill-creator
What it does

Author blameless postmortems that surface learning without retribution

Best for

Converting incidents into organization-wide learning while protecting the psychological safety required for honest incident reporting, especially in distributed systems and on-call rotations.

Inputs
  • · Incident timeline with UTC timestamps and actor names
  • · Impact quantification (users affected, duration, error budget, financial cost)
  • · Proximate cause and contributing factors (from investigation)
  • · Action items with owners and deadlines
Outputs
  • · Postmortem document with nine sections (title, status, summary, impact, timeline, root-cause analysis, contributing factors, latent conditions, action items)
  • · Finalized postmortem in Draft/In Review/Finalized/Closed lifecycle state
  • · Action item tracking with ownership and accountability
Preconditions
  • · Incident has concluded or stabilized (not ongoing)
  • · Sufficient investigation has been performed to identify contributing factors
  • · Organizational commitment to Just Culture and blameless principle (else postmortem becomes theater)
  • · Access to timeline data (logs, metrics, alert history)
Failure modes
  • · Stopping at 'operator error' and not surfacing system design flaws
  • · Using postmortem document against the author in personnel actions (destroys reporting culture)
  • · Focusing on blame rather than learning (defeats blameless purpose)
  • · Action items without clear owners → never executed
  • · Assuming single root cause when the incident was multi-causal
  • · Inflating or deflating impact estimates (reduces credibility)
Trust signals
  • · Based on Google SRE Book (Beyer et al. 2016) validated in large-scale operations
  • · Explicitly designed to work with Just Culture (Marx/GAIN algorithms)
  • · Nine-section structure covers both learning and accountability requirements
  • · Distinction between active failures (sharp-end) and latent failures (system design) prevents scapegoating
  • · Tested in mission-critical environments (Google, AWS, Stripe postmortems follow this structure)