Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

iainmcl/incident-response

Name: incident-response
Author: iainmcl

claude/skills/incident-response/SKILL.md

npx skillsauth add iainmcl/dotfiles incident-response

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Incident Response

End-to-end incident response: gather context → investigate root cause → measure blast radius → implement fix → document everything.

Slack update pattern

The incident Slack channel is the live feed of the investigation. Check it:

Once at the start (Step 1) to get a full picture before touching anything
Before each subsequent step to catch new findings from other investigators

At each re-check, only surface messages newer than the previous check. If a new finding is directly relevant to the current step (e.g. someone just identified the root cause while you're about to investigate), incorporate it and note that you did so.

Track a last_checked timestamp after each read and use it to filter the next fetch.

Step 0 — Gather inputs

At least one of the following entry points must be provided. Ask for any that are missing:

| Input | Example | Notes | |---|---|---| | Sentry issue URL | https://sentry.io/organizations/.../issues/123/ | Primary entry point | | Datadog monitor/alert URL | https://app.datadoghq.com/monitors/456 | Use if no Sentry issue exists yet | | Slack incident channel | #inc-2026-01-invoice-failures | Use if alerted via Slack without a direct link |

At least one entry point is required — if none is provided, ask before proceeding.

Also ask for anything not already provided:

Repo — full GitHub path (e.g. travelperk/billing-service)
Notion debrief page — URL to update, or "new" to create one (can be skipped if not yet created)

Entry point routing:

Sentry URL given → proceed from Step 2 (Sentry fetch)
Datadog URL given, no Sentry → fetch the Datadog alert in Step 2 to find the affected service/error, then search Sentry for related issues before investigating
Slack channel only → read the channel in Step 1 and extract any Sentry/Datadog links posted there; if none found, treat the channel messages as the primary error signal

Tell the user: "Running incident response — I'll confirm each step as I go."

Step 1 — Read the Slack incident channel

Before touching the codebase, read the incident channel. Other investigators may have already found the root cause, affected customers, or a mitigation path.

Use the Slack MCP (slack_read_channel) to fetch recent messages from the incident channel.

Extract and summarise:

Timeline of when the issue was first noticed
Any root cause theories already posted
Customer names or IDs mentioned as affected
Any mitigations already applied or attempted
Names of who is investigating what (avoid duplicating their work)

Confirm to user: "Read #channel — here's what the team has found so far: ..."

Step 2 — Fetch the Sentry issue

Use the Sentry MCP to fetch the issue at the provided URL.

Capture:

Issue title and ID
Event count and affected users
First seen / last seen timestamps
Full stack trace from the most recent event
Any linked releases or tags

Confirm to user: "Fetched Sentry issue: <title> — <N> events, <N> users, first seen <date>."

Step 3 — Investigate root cause

Read the stack trace. Identify the exact file, line, and condition triggering the error.

Clone or navigate to the repo and read the relevant source files. Do not guess — confirm the bug in the code before proposing a fix.

gh repo clone <repo> /tmp/incident-<issue-id> 2>/dev/null || git -C /tmp/incident-<issue-id> pull

Cross-reference with Slack findings from Step 1. If a team member already identified the root cause, validate it in the code rather than starting from scratch.

Confirm to user: "Root cause identified: <file>:<line> — <one sentence explanation>."

Step 4 — Measure blast radius via Snowflake

Query Snowflake to determine how many records and customers are affected.

Tailor the query to the root cause — e.g. if the bug corrupts invoices created after a certain date, query for invoices in that state.

Save both the query and results. Include:

Number of affected records
Number of affected customers / organisations
Date range of impact
Whether impact is ongoing or bounded

If Snowflake MCP is unavailable, tell the user: "Snowflake not connected — please run this query manually: <query>" and continue.

Confirm to user: "Blast radius: <N> records, <N> customers affected since <date>."

Step 5 — Check Datadog

Search Datadog for monitors or dashboards related to the affected service.

Look for:

Any monitors that should have fired but didn't (detection gap)
Any monitors that need threshold updates based on what you now know
Error rate or latency spikes correlated with the incident timeline

If Datadog MCP is unavailable, note which service/metric to check manually and continue.

Confirm to user: "Datadog checked — <summary of monitor state / any gaps identified>."

Step 6 — Create the Jira ticket

Use the Atlassian MCP (createJiraIssue) to create a bug ticket:

Project: APP
Issue type: Bug
Priority: >1000 events = High, >100 = Medium, else Low
Summary: [Incident] <concise description>

Description:

## Sentry issue
<sentry URL>
Frequency: <event count> events, <N> users affected
First seen: <date> | Last seen: <date>

## Root cause
<file>:<line> — <explanation>

## Blast radius
<N> records, <N> customers affected
<Snowflake query used>

## Slack incident channel
<link to channel>

## Stack trace (excerpt)
<most relevant frames>

Note the ticket number for the branch name and PR.

Confirm to user: "Jira ticket created: APP-<N> — <link>."

Step 7 — Implement the fix

On a new branch APP-<ticket>-<short-description>:

Make the minimal fix — do not refactor surrounding code
Follow existing service abstractions and patterns (check similar files first)
Add a why comment only if the fix is non-obvious
Run tests if the environment supports it

If uncertain about the fix, implement the most likely approach and flag uncertainty in the PR.

Confirm to user: "Fix implemented on branch <branch-name>."

Step 8 — Create the draft PR

gh pr create --draft \
  --title "fix: <description> (APP-<ticket>)" \
  --body "<body>"

PR body:

## What
<one sentence: what bug this fixes>

## Why
<root cause in plain English>

## Blast radius
<N> records, <N> customers affected since <date>

## Sentry
<sentry URL> — <N> events, <N> users

## Jira
https://travelperk.atlassian.net/browse/APP-<ticket>

## Fix
<what changed and why it works>

## Risks
<anything uncertain, edge cases, areas for reviewer attention>

Confirm to user: "Draft PR opened: <link>."

Step 9 — Update the Notion incident debrief

Use the Notion MCP to update the debrief page at the provided URL (or create one if "new" was specified).

Include:

Timeline: when the issue started, when it was detected, when the fix was deployed
Root cause: plain-language explanation
Affected data: blast radius figures from Snowflake
Remediation: what the fix does, PR link, Jira link
Detection gap: did monitors catch this? If not, what should be added?
Follow-up actions: anything that needs to happen post-fix (data backfill, monitor updates, etc.)

Write in blameless, conversational language. Do not make claims not supported by evidence from the codebase, Sentry, or Slack.

If Notion MCP is unavailable, output the debrief content as markdown so the user can paste it manually.

Confirm to user: "Notion debrief updated: <link>."

Final summary

Once all steps are complete, output:

## Incident response complete

- Root cause: <one sentence>
- Blast radius: <N> records, <N> customers
- Jira: APP-<N> — <link>
- PR: <link>
- Notion debrief: <link>

Remaining actions:
- <any manual steps needed (e.g. Snowflake query to run, monitor to update)>

iainmcl/incident-response

claude/skills/incident-response/SKILL.md

Run a full incident response workflow for an active incident. Covers investigation, blast radius via Snowflake, Slack channel triage, fix implementation, Jira ticket, draft PR, Notion debrief, and Datadog monitor review. Use when asked to "run incident response", "we have an incident", "investigate this error", or given a Sentry URL with urgency context.

testing

Updated May 19, 2026

$ install --global

skillsauth

npx skillsauth add iainmcl/dotfiles incident-response

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 19, 2026, 7:53 AM187.6s1 file scanned

SKILL.md

name:: incident-response
description:: Run a full incident response workflow for an active incident. Covers investigation, blast radius via Snowflake, Slack channel triage, fix implementation, Jira ticket, draft PR, Notion debrief, and Datadog monitor review. Use when asked to "run incident response", "we have an incident", "investigate this error", or given a Sentry URL with urgency context.

Incident Response

End-to-end incident response: gather context → investigate root cause → measure blast radius → implement fix → document everything.

Slack update pattern

The incident Slack channel is the live feed of the investigation. Check it:

Once at the start (Step 1) to get a full picture before touching anything
Before each subsequent step to catch new findings from other investigators

Track a last_checked timestamp after each read and use it to filter the next fetch.

Step 0 — Gather inputs

At least one of the following entry points must be provided. Ask for any that are missing:

At least one entry point is required — if none is provided, ask before proceeding.

Also ask for anything not already provided:

Repo — full GitHub path (e.g. travelperk/billing-service)
Notion debrief page — URL to update, or "new" to create one (can be skipped if not yet created)

Entry point routing:

Sentry URL given → proceed from Step 2 (Sentry fetch)
Datadog URL given, no Sentry → fetch the Datadog alert in Step 2 to find the affected service/error, then search Sentry for related issues before investigating
Slack channel only → read the channel in Step 1 and extract any Sentry/Datadog links posted there; if none found, treat the channel messages as the primary error signal

Tell the user: "Running incident response — I'll confirm each step as I go."

Step 1 — Read the Slack incident channel

Before touching the codebase, read the incident channel. Other investigators may have already found the root cause, affected customers, or a mitigation path.

Use the Slack MCP (slack_read_channel) to fetch recent messages from the incident channel.

Extract and summarise:

Timeline of when the issue was first noticed
Any root cause theories already posted
Customer names or IDs mentioned as affected
Any mitigations already applied or attempted
Names of who is investigating what (avoid duplicating their work)

Confirm to user: "Read #channel — here's what the team has found so far: ..."

Step 2 — Fetch the Sentry issue

Use the Sentry MCP to fetch the issue at the provided URL.

Capture:

Issue title and ID
Event count and affected users
First seen / last seen timestamps
Full stack trace from the most recent event
Any linked releases or tags

Confirm to user: "Fetched Sentry issue: <title> — <N> events, <N> users, first seen <date>."

Step 3 — Investigate root cause

Read the stack trace. Identify the exact file, line, and condition triggering the error.

Clone or navigate to the repo and read the relevant source files. Do not guess — confirm the bug in the code before proposing a fix.

gh repo clone <repo> /tmp/incident-<issue-id> 2>/dev/null || git -C /tmp/incident-<issue-id> pull

Cross-reference with Slack findings from Step 1. If a team member already identified the root cause, validate it in the code rather than starting from scratch.

Confirm to user: "Root cause identified: <file>:<line> — <one sentence explanation>."

Step 4 — Measure blast radius via Snowflake

Query Snowflake to determine how many records and customers are affected.

Tailor the query to the root cause — e.g. if the bug corrupts invoices created after a certain date, query for invoices in that state.

Save both the query and results. Include:

Number of affected records
Number of affected customers / organisations
Date range of impact
Whether impact is ongoing or bounded

If Snowflake MCP is unavailable, tell the user: "Snowflake not connected — please run this query manually: <query>" and continue.

Confirm to user: "Blast radius: <N> records, <N> customers affected since <date>."

Step 5 — Check Datadog

Search Datadog for monitors or dashboards related to the affected service.

Look for:

Any monitors that should have fired but didn't (detection gap)
Any monitors that need threshold updates based on what you now know
Error rate or latency spikes correlated with the incident timeline

If Datadog MCP is unavailable, note which service/metric to check manually and continue.

Confirm to user: "Datadog checked — <summary of monitor state / any gaps identified>."

Step 6 — Create the Jira ticket

Use the Atlassian MCP (createJiraIssue) to create a bug ticket:

Project: APP
Issue type: Bug
Priority: >1000 events = High, >100 = Medium, else Low
Summary: [Incident] <concise description>

Description:

## Sentry issue
<sentry URL>
Frequency: <event count> events, <N> users affected
First seen: <date> | Last seen: <date>

## Root cause
<file>:<line> — <explanation>

## Blast radius
<N> records, <N> customers affected
<Snowflake query used>

## Slack incident channel
<link to channel>

## Stack trace (excerpt)
<most relevant frames>

Note the ticket number for the branch name and PR.

Confirm to user: "Jira ticket created: APP-<N> — <link>."

Step 7 — Implement the fix

On a new branch APP-<ticket>-<short-description>:

Make the minimal fix — do not refactor surrounding code
Follow existing service abstractions and patterns (check similar files first)
Add a why comment only if the fix is non-obvious
Run tests if the environment supports it

If uncertain about the fix, implement the most likely approach and flag uncertainty in the PR.

Confirm to user: "Fix implemented on branch <branch-name>."

Step 8 — Create the draft PR

gh pr create --draft \
  --title "fix: <description> (APP-<ticket>)" \
  --body "<body>"

PR body:

## What
<one sentence: what bug this fixes>

## Why
<root cause in plain English>

## Blast radius
<N> records, <N> customers affected since <date>

## Sentry
<sentry URL> — <N> events, <N> users

## Jira
https://travelperk.atlassian.net/browse/APP-<ticket>

## Fix
<what changed and why it works>

## Risks
<anything uncertain, edge cases, areas for reviewer attention>

Confirm to user: "Draft PR opened: <link>."

Step 9 — Update the Notion incident debrief

Use the Notion MCP to update the debrief page at the provided URL (or create one if "new" was specified).

Include:

Timeline: when the issue started, when it was detected, when the fix was deployed
Root cause: plain-language explanation
Affected data: blast radius figures from Snowflake
Remediation: what the fix does, PR link, Jira link
Detection gap: did monitors catch this? If not, what should be added?
Follow-up actions: anything that needs to happen post-fix (data backfill, monitor updates, etc.)

Write in blameless, conversational language. Do not make claims not supported by evidence from the codebase, Sentry, or Slack.

If Notion MCP is unavailable, output the debrief content as markdown so the user can paste it manually.

Confirm to user: "Notion debrief updated: <link>."

Final summary

Once all steps are complete, output:

## Incident response complete

- Root cause: <one sentence>
- Blast radius: <N> records, <N> customers
- Jira: APP-<N> — <link>
- PR: <link>
- Notion debrief: <link>

Remaining actions:
- <any manual steps needed (e.g. Snowflake query to run, monitor to update)>

Related Skills

iainmcl/weekly-review

development

VerifiedTrustedCommunity

Run a weekly achievement review - pulls from Jira, GitHub, and Slack to capture what you shipped in the last week, maps achievements to your 2026 goals, and appends impact-focused entries to your brag doc. Use when asked to "do a weekly review", "capture this week's wins", "update my brag doc", "what did I ship this week", "record my achievements", "what have I done this week", "add to my performance doc", or anything about tracking weekly progress, brag doc entries, or performance evidence. Trigger even if the user just says "weekly review" or "document what I did".

SKILL.mdUpdated May 19, 2026

iainmcl/weekly-review

iainmcl/skill-creator

testing

VerifiedTrustedCommunity

Create new skills, modify and improve existing skills, and measure skill performance. Use when users want to create a skill from scratch, edit, or optimize an existing skill, run evals to test a skill, benchmark skill performance with variance analysis, or optimize a skill's description for better triggering accuracy.

SKILL.mdUpdated May 19, 2026

iainmcl/skill-creator

iainmcl/setup-project-update

tools

VerifiedTrustedCommunity

Set up a project update config for the current repo, so that running project-update requires no setup questions. Use when asked to "set up project updates", "configure project update", "initialise project update", or "create a project update config". Run this once per project repo.

SKILL.mdUpdated May 19, 2026

iainmcl/setup-project-update

iainmcl/sentry-fix

testing

VerifiedTrustedCommunity

Find the highest-frequency unresolved Sentry error for the VAT & Invoicing or Billing team, understand its root cause, create a Jira ticket in the APP project, implement a fix, and open a draft PR. Use when asked to "fix sentry issues", "triage sentry errors", "look at sentry", "what's broken in sentry", "create a fix for a sentry issue", or "sentry triage". Runs the full flow autonomously in the background.

SKILL.mdUpdated May 19, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/iainmcl/dotfiles.git

# Copy into Claude Code skills folder (global)
cp -r dotfiles/claude/skills/incident-response ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

iainmcl/dotfiles

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT