src/main/resources/targets/claude/skills/core/ops/x-handle-incident/SKILL.md
Guides incident response with severity-based checklists, communication templates, and postmortem triggers. Interactive guide for SEV1-SEV4 incidents covering classification, response coordination, and action item tracking.
npx skillsauth add edercnj/ia-dev-environment x-handle-incidentInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Provides an interactive incident response guide for {{PROJECT_NAME}} that walks the team through the complete process from detection to resolution. Classifies severity, loads severity-specific checklists, coordinates communication, triggers postmortems, and tracks action items.
/x-handle-incident — start interactive severity classification/x-handle-incident SEV1 — start SEV1 critical incident response/x-handle-incident SEV2 --postmortem — SEV2 incident with postmortem generation/x-handle-incident SEV3 --notify — SEV3 incident with communication templates/x-handle-incident SEV1 --postmortem --notify — full incident response workflow| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| severity | positional | (interactive) | Severity level: SEV1, SEV2, SEV3, or SEV4 |
| --postmortem | boolean | false | Generate postmortem document (auto-enabled for SEV1/SEV2) |
| --notify | boolean | false | Generate communication templates for all channels |
1. CLASSIFY -> Analyze impact description and classify severity (SEV1-SEV4)
2. LOAD -> Load severity-specific checklist from SRE Practices KP
3. GUIDE -> Conduct team through Detection -> Triage -> Mitigation -> Resolution
4. COMMUNICATE -> Generate communication templates (status page, Slack, email)
5. POSTMORTEM -> Generate postmortem document from template (SEV1/SEV2 or --postmortem)
6. TRACK -> Register action items with owners and deadlines
If severity is provided as argument, validate and use directly. If omitted, ask the user about the impact and suggest classification.
| Severity | Label | Criteria | Response Time | Update Frequency | |----------|-------|----------|---------------|------------------| | SEV1 | Critical | Total service outage, significant financial loss, data breach affecting all users | 15 min | Every 30 min | | SEV2 | High | Major feature unavailable, severe performance degradation, large user impact | 30 min | Every 1 hour | | SEV3 | Medium | Minor feature impacted, workaround available, limited user impact | 4 hours | Every 4 hours | | SEV4 | Low | Cosmetic issue, minimal user impact, no business impact | Next business day | Daily |
Severity Decision Criteria:
Load the severity-specific checklist from the SRE Practices knowledge pack (knowledge/sre-practices/). Each severity level has different response requirements:
SEV1 — Critical Checklist:
SEV2 — High Checklist:
SEV3 — Medium Checklist:
SEV4 — Low Checklist:
Conduct the team through the incident response flow. Use the sre-engineer agent via Agent tool for reliability expertise and validation.
Generate communication templates based on severity and channels. Frequency varies by severity level.
**[INVESTIGATING/IDENTIFIED/MONITORING/RESOLVED]** — {{PROJECT_NAME}}
**Impact:** [Description of user-facing impact]
**Affected Services:** [List of affected services]
**Current Status:** [What we know and what we are doing]
**Next Update:** [Time of next update based on severity]
:rotating_light: **Incident Update — [SEV level]**
**Service:** [affected service]
**Impact:** [description]
**Status:** [Investigating/Mitigating/Resolved]
**IC:** [Incident Commander name]
**Timeline:**
- [HH:MM] [Event description]
**Next Steps:** [what happens next]
**Next Update:** [time]
Subject: [SEV level] Incident — [Brief description]
Dear Stakeholders,
We are currently experiencing [description of the issue]
affecting [scope of impact].
**Severity:** [SEV1/SEV2/SEV3/SEV4]
**Impact:** [User-facing impact description]
**Status:** [Current status]
**ETA for Resolution:** [Estimate or "Under investigation"]
We will provide updates every [frequency based on severity].
Best regards,
[Team Name]
Update Frequency by Severity:
| Severity | Update Frequency | Channels | |----------|-----------------|----------| | SEV1 | Every 30 min | Status Page, Slack, Email | | SEV2 | Every 1 hour | Status Page, Slack, Email | | SEV3 | Every 4 hours | Status Page, Slack | | SEV4 | Daily | Slack |
If --postmortem flag is provided, or severity is SEV1 or SEV2, generate a postmortem document from the _TEMPLATE-POSTMORTEM.md template.
Postmortem is triggered when:
--postmortem flag is explicitly provided (any severity)Postmortem document is pre-filled with:
If the postmortem template is not available, generate an inline postmortem with the basic structure:
# Postmortem — [Incident Title]
## Incident Summary
| Field | Value |
|-------|-------|
| Severity | [SEV level] |
| Date | [Date] |
| Duration | [Duration] |
| Affected Services | [Services] |
## Timeline
[Pre-filled from incident response]
## Root Cause Analysis
[To be completed]
## Action Items
[Pre-filled from tracked items]
## Lessons Learned
[To be completed]
Register all action items identified during the incident response. Each action item includes:
| Field | Description | |-------|-------------| | ID | Sequential identifier (AI-001, AI-002, ...) | | Description | What needs to be done | | Owner | Person responsible | | Deadline | Target completion date | | Priority | HIGH / MEDIUM / LOW | | Status | OPEN / IN_PROGRESS / DONE |
Output format:
## Action Items
| ID | Description | Owner | Deadline | Priority | Status |
|----|-------------|-------|----------|----------|--------|
| AI-001 | [description] | [owner] | [date] | HIGH | OPEN |
| AI-002 | [description] | [owner] | [date] | MEDIUM | OPEN |
| Severity | Label | Description | Response Time | Update Frequency | Postmortem Required | |----------|-------|-------------|---------------|------------------|---------------------| | SEV1 | Critical | Total service outage, significant financial loss, data breach | 15 min | 30 min | Yes (always) | | SEV2 | High | Major feature unavailable, degraded experience for many users | 30 min | 1 hour | Yes (always) | | SEV3 | Medium | Minor feature impacted, workaround available | 4 hours | 4 hours | Only if --postmortem | | SEV4 | Low | Cosmetic issue, minimal impact | Next business day | Daily | Only if --postmortem |
See Step 4 above for complete templates per channel (Status Page, Slack/Teams, Email) and update frequency per severity level.
| Scenario | Action |
|----------|--------|
| Severity not provided | Ask the user about the impact and suggest classification based on description |
| Invalid severity (e.g., SEV5) | Reject with message: "Invalid severity. Use SEV1, SEV2, SEV3, or SEV4" |
| --postmortem without template | Generate inline postmortem with basic structure |
| No Incident Commander available | Assign the requesting user as IC and recommend finding a replacement |
| Incomplete information | Proceed with available data, note gaps in postmortem |
| Skill | Relationship | Context |
|-------|-------------|---------|
| x-troubleshoot-operations | called-by | Escalates to this skill when an issue becomes a production incident |
| sre-engineer (agent) | calls | Delegates reliability expertise and checklist validation via Agent tool |
| sre-practices (KP) | reads | References knowledge/sre-practices/ for incident management processes |
_TEMPLATE-POSTMORTEM.md for postmortem document generation. Fallback: inline postmortem with basic structure when template is absent._TEMPLATE-INCIDENT-RESPONSE.md for severity classification referencetesting
Scaffolds a Helidon SE/MP service with routing, health, config, Dockerfile, and tests.
tools
Generates a Picocli @Command with subcommands, options, converters, and unit tests.
testing
Scaffolds a Micronaut service with @Controller, DI, health, Dockerfile, and tests.
testing
Scaffolds a Helidon SE/MP service with routing, health, config, Dockerfile, and tests.