skills/postmortem/SKILL.md
Write a blameless incident postmortem — gather timeline, root cause, impact, action items
npx skillsauth add mattdurham/bob bob:postmortemInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are writing a blameless incident postmortem. The goal is to understand what happened, why it happened, and what concrete actions will prevent recurrence — not to assign blame.
Ask the user for the following if not provided. Ask one question at a time, not all at once:
Produce a postmortem in this structure:
# Postmortem: [Title]
**Date:** [incident date]
**Status:** [Draft / In Review / Complete]
**Severity:** [P1 / P2 / P3 / P4]
**Authors:** [names]
---
## Summary
[2–3 sentence description of what happened, impact, and how it was resolved]
## Impact
| Dimension | Detail |
| ------------------- | -------------------------------------------- |
| Duration | [start] → [resolution] ([X hours Y minutes]) |
| Detected | [detection time] — [TTD: X min] |
| Mitigated | [mitigation time] — [TTM: X min] |
| Resolved | [resolution time] — [TTR: X min] |
| Users affected | [count or percentage] |
| Services affected | [list] |
| Error budget impact | [if known] |
## Timeline
| Time (TZ) | Event |
| --------- | ---------------- |
| HH:MM | [event] |
| HH:MM | [action by whom] |
| ... | ... |
## Root Cause
[The underlying technical reason. Not the trigger — the systemic issue that allowed this to happen.]
## Trigger
[The specific event that initiated the incident]
## Detection
[How the incident was detected. Include alert name if applicable.]
## Resolution
[What was done to resolve it. Step by step if relevant.]
## What Went Well
- [thing that worked — monitoring, runbook, communication, etc.]
- ...
## What Went Wrong
- [gap or failure — missing alert, unclear runbook, toil, etc.]
- ...
## Where We Got Lucky
- [things that could have been much worse]
- ...
## Action Items
| Action | Owner | Priority | Due |
| ----------------------------- | ----------- | ---------- | ------ |
| [specific, measurable action] | [name/team] | [P1/P2/P3] | [date] |
| ... | ... | ... | ... |
## Supporting Information
[Links to dashboards, logs, alert history, Slack threads, runbooks]
Root cause vs trigger: The trigger is "the deploy at 14:32 introduced a nil pointer". The root cause is "we have no integration tests for nil config values" or "the deploy pipeline doesn't run smoke tests". Dig deeper than the trigger.
Action items must be:
Severity guide:
TTD / TTM / TTR:
development
Team-based development workflow using experimental agent teams - INIT → WORKTREE → BRAINSTORM → PLAN → EXECUTE → REVIEW → COMPLETE
development
Implements code changes following plans and specifications
data-ai
Self-directed reviewer that claims completed tasks and reviews them incrementally
data-ai
Self-directed planner that claims a plan task (blocked by brainstorm), creates the implementation plan, and stays alive to answer questions from teammates