Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

qa-aman/postmortem

Name: postmortem
Author: qa-aman

skills/by-role/devops/postmortem/SKILL.md

npx skillsauth add qa-aman/claude-skills postmortem

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Overview

Based on "The Field Guide to Understanding Human Error" by Sidney Dekker and the Google SRE Book. Dekker's fundamental insight: human error is never the cause of an incident - it is a symptom of a system that put people in a position where mistakes were likely. A blameless postmortem asks not "who made the mistake?" but "what conditions made this mistake likely, and how do we change those conditions?"

Google SRE's postmortem culture: incidents are opportunities to improve systems, not to assign blame. Engineers who feel safe reporting mistakes surface problems before they become incidents.

Workflow

Step 1: Schedule within 48 hours (SEV1) or 1 week (SEV2)

Postmortems decay in value quickly. Memories fade, context is lost. Set the meeting within 48 hours of a SEV1 resolution.

Attendees: IC, Operations Lead, on-call engineer(s), and anyone who can contribute to understanding what happened. No executives in the room - their presence changes behavior.

Step 2: Build the timeline

Reconstruct what happened in chronological order. Be precise about times.

Timeline:
[HH:MM] - [what happened - system event, human action, or observation]
[HH:MM] - [alert fired: name]
[HH:MM] - [on-call engineer paged]
[HH:MM] - [first action taken]
[HH:MM] - [mitigation applied]
[HH:MM] - [service restored]

Include: system events, alerts, human decisions, and the reasoning people had at the time.

Step 3: Identify contributing factors (not root cause)

Dekker's principle: complex systems rarely have a single root cause. They have contributing factors that aligned to create conditions for failure.

For each contributing factor, ask: "If this had been different, would the incident have been less likely or less severe?"

Common contributing factor categories:

Technical: insufficient alerting, missing circuit breakers, inadequate testing
Process: no deployment checklist, unclear escalation path, missing runbook
Knowledge: alert with no runbook, undocumented dependency, new team member
Capacity: system at limits, team understaffed, alert fatigue

Step 4: Apply the 5 Whys

For the primary contributing factor, drill down:

Why did X happen? - Because Y
Why did Y happen? - Because Z
Why did Z happen? - Because...

Stop when you reach a level where an action item can change the system. "Because the engineer made a mistake" is never the stopping point - that's where the analysis begins.

Step 5: Write action items

Every postmortem must produce action items. Without them, it's just storytelling.

Each action item:

Action: [specific thing to do]
Owner: [named person, not "the team"]
Due: [specific date]
Priority: [P1 - fix before next deploy / P2 - this sprint / P3 - this quarter]
Type: [Prevention / Detection / Mitigation / Process]

Action types:

Prevention - stops this failure mode from occurring
Detection - makes the problem visible sooner
Mitigation - reduces impact when it does occur
Process - changes how the team responds

Step 6: Write the postmortem document

Postmortem: [Incident title]
Date: [incident date]
Authors: [IC + contributors]
Severity: [SEV1/SEV2]
Duration: [start to resolution]
Impact: [users affected, features degraded]

Summary
[2-3 sentences: what happened, why it mattered, how it was resolved]

Timeline
[full timeline from Step 2]

Contributing Factors
[bullet list from Step 3 - no blame language]

5 Whys Analysis
[drill-down from Step 4]

What Went Well
[what helped contain or resolve the incident faster]

Action Items
[table from Step 5]

Lessons Learned
[2-3 insights that are generalizably useful beyond this specific incident]

Step 7: Share broadly

Postmortems are most valuable when shared. Publish to an internal wiki, engineering all-hands, or team newsletter. Other teams often have the same latent failure modes.

Anti-Patterns

1. Blame language Bad: "The engineer deployed without testing." Good: "The deployment process did not have a required staging environment check, making it possible to deploy untested code." Dekker: blame stops investigation. "Human error" as a cause explains nothing and changes nothing.

2. No action items Bad: Postmortem documents what happened but ends without concrete next steps. Good: Every postmortem produces at least 2 action items with owners and dates. Track them to completion.

3. Postmortem after the deadline Bad: "We'll get to it when things calm down." (more than 1 week for SEV1) Good: Timebox it. A 60-minute postmortem within 48 hours is worth more than a 2-hour postmortem 3 weeks later.

4. Only focusing on what went wrong Bad: Pure failure analysis, nothing about what was done right. Good: Include "What went well." The practices that helped contain the incident should be reinforced, not just the failures fixed.

Quality Checklist

[ ] Scheduled within 48h (SEV1) or 1 week (SEV2)
[ ] Timeline is precise (specific times, not "around noon")
[ ] Contributing factors identified with no blame language
[ ] 5 Whys drilled to a system-level cause, not a person
[ ] Every action item has: owner (named), due date, priority, type
[ ] "What went well" section included
[ ] Document shared with broader engineering team
[ ] Action items tracked to completion in next retrospective

qa-aman/postmortem

skills/by-role/devops/postmortem/SKILL.md

Write a blameless postmortem after an incident. Use when the user says "postmortem", "post-incident review", "PIR", "what happened in that incident", "incident review", "blameless postmortem", "5 whys", "how do we prevent this again", or needs to document learnings from a production incident - even if they don't explicitly say "postmortem".

13 stars

testing

Updated Apr 23, 2026

$ install --global

skillsauth

npx skillsauth add qa-aman/claude-skills postmortem

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 23, 2026, 2:01 PM214.6s1 file scanned

SKILL.md

name:: postmortem
description:: >

Overview

Google SRE's postmortem culture: incidents are opportunities to improve systems, not to assign blame. Engineers who feel safe reporting mistakes surface problems before they become incidents.

Workflow

Step 1: Schedule within 48 hours (SEV1) or 1 week (SEV2)

Postmortems decay in value quickly. Memories fade, context is lost. Set the meeting within 48 hours of a SEV1 resolution.

Attendees: IC, Operations Lead, on-call engineer(s), and anyone who can contribute to understanding what happened. No executives in the room - their presence changes behavior.

Step 2: Build the timeline

Reconstruct what happened in chronological order. Be precise about times.

Timeline:
[HH:MM] - [what happened - system event, human action, or observation]
[HH:MM] - [alert fired: name]
[HH:MM] - [on-call engineer paged]
[HH:MM] - [first action taken]
[HH:MM] - [mitigation applied]
[HH:MM] - [service restored]

Include: system events, alerts, human decisions, and the reasoning people had at the time.

Step 3: Identify contributing factors (not root cause)

Dekker's principle: complex systems rarely have a single root cause. They have contributing factors that aligned to create conditions for failure.

For each contributing factor, ask: "If this had been different, would the incident have been less likely or less severe?"

Common contributing factor categories:

Technical: insufficient alerting, missing circuit breakers, inadequate testing
Process: no deployment checklist, unclear escalation path, missing runbook
Knowledge: alert with no runbook, undocumented dependency, new team member
Capacity: system at limits, team understaffed, alert fatigue

Step 4: Apply the 5 Whys

For the primary contributing factor, drill down:

Why did X happen? - Because Y
Why did Y happen? - Because Z
Why did Z happen? - Because...

Stop when you reach a level where an action item can change the system. "Because the engineer made a mistake" is never the stopping point - that's where the analysis begins.

Step 5: Write action items

Every postmortem must produce action items. Without them, it's just storytelling.

Each action item:

Action: [specific thing to do]
Owner: [named person, not "the team"]
Due: [specific date]
Priority: [P1 - fix before next deploy / P2 - this sprint / P3 - this quarter]
Type: [Prevention / Detection / Mitigation / Process]

Action types:

Prevention - stops this failure mode from occurring
Detection - makes the problem visible sooner
Mitigation - reduces impact when it does occur
Process - changes how the team responds

Step 6: Write the postmortem document

Postmortem: [Incident title]
Date: [incident date]
Authors: [IC + contributors]
Severity: [SEV1/SEV2]
Duration: [start to resolution]
Impact: [users affected, features degraded]

Summary
[2-3 sentences: what happened, why it mattered, how it was resolved]

Timeline
[full timeline from Step 2]

Contributing Factors
[bullet list from Step 3 - no blame language]

5 Whys Analysis
[drill-down from Step 4]

What Went Well
[what helped contain or resolve the incident faster]

Action Items
[table from Step 5]

Lessons Learned
[2-3 insights that are generalizably useful beyond this specific incident]

Step 7: Share broadly

Postmortems are most valuable when shared. Publish to an internal wiki, engineering all-hands, or team newsletter. Other teams often have the same latent failure modes.

Anti-Patterns

Quality Checklist

[ ] Scheduled within 48h (SEV1) or 1 week (SEV2)
[ ] Timeline is precise (specific times, not "around noon")
[ ] Contributing factors identified with no blame language
[ ] 5 Whys drilled to a system-level cause, not a person
[ ] Every action item has: owner (named), due date, priority, type
[ ] "What went well" section included
[ ] Document shared with broader engineering team
[ ] Action items tracked to completion in next retrospective

Related Skills

qa-aman/webinar-planner

development

VerifiedTrustedCommunity

Plan a webinar end-to-end using April Dunford's Obviously Awesome positioning framework to find the topic angle that makes the webinar obviously valuable to the right audience. Produces topic positioning, abstract, speaker brief, registration page, promotion sequence, day-of run-of-show, and post-webinar follow-up. Use when the user asks to plan a webinar, virtual event, online workshop, "we need a webinar on X", host a webinar, online masterclass, or any live virtual event with promotion and follow-up. Reads ICP, services, and brand voice from knowledge/.

13SKILL.mdUpdated May 5, 2026

qa-aman/webinar-planner

qa-aman/thought-leadership-writer

development

VerifiedTrustedCommunity

Write long-form thought leadership articles, opinion pieces, industry POV essays, and CEO/founder bylines using the Made to Stick SUCCESs framework (Chip and Dan Heath). Use when the user asks for a long-form article, executive byline, opinion piece, industry POV, manifesto, "explain our point of view on X", or wants to publish an authority-building piece (1200-2500 words). Reads brand voice and positioning from knowledge/.

13SKILL.mdUpdated May 5, 2026

qa-aman/thought-leadership-writer

qa-aman/social-calendar

development

VerifiedTrustedCommunity

Plan a monthly content calendar across channels using the Content Marketing Matrix (Dave Chaffey, Smart Insights) - Entertain/Inspire/Educate/Convince. Every post gets a quadrant label. The monthly calendar must hit 40% Educate, 40% Inspire+Convince, 20% Entertain. Produces a week-by-week posting schedule with topics, formats, channels, and asset links. Use when the user says "content calendar", "social calendar", "plan next month's content", "what should we post", "content plan", "editorial calendar", "schedule posts for the month", or wants a structured posting plan for LinkedIn, Twitter, email, or blog. Reads brand voice, ICP, and past learnings from knowledge/.

13SKILL.mdUpdated May 5, 2026

qa-aman/social-calendar

qa-aman/seo-article-writer

development

VerifiedTrustedCommunity

Write SEO-optimized long-form articles targeting specific keywords using the They Ask You Answer Big 5 framework (Marcus Sheridan). Articles are categorized by Big 5 type (Cost, Problems, Versus, Best/Reviews, How-To) and structured accordingly. The "answer first" rule applies to every article. Use when the user asks for an SEO article, blog post for ranking, "rank for keyword X", organic content, search-optimized post, pillar page, or content for organic traffic. Includes keyword targeting, search intent matching, internal linking suggestions, and meta tags.

13SKILL.mdUpdated May 5, 2026

qa-aman/seo-article-writer

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/qa-aman/claude-skills.git

# Copy into Claude Code skills folder (global)
cp -r claude-skills/skills/by-role/devops/postmortem ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

qa-aman/claude-skills

13 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT