skills/fix-sentry-issues/SKILL.md
Use Sentry MCP to discover, triage, and fix production issues with root-cause analysis. Use when asked to fix Sentry issues, triage production errors, investigate error spikes, or clean up Sentry noise. Requires Sentry MCP server. Triggers on "fix sentry", "triage errors", "production bugs", "sentry issues".
npx skillsauth add fantomsuj/notion fix-sentry-issuesInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Systematically discover, triage, investigate, and fix production issues using Sentry MCP. One PR per issue, root-cause analysis required.
NEVER treat log level changes as fixes. Changing logger.error to logger.warn or logger.info silences Sentry but doesn't fix the user's experience.
For every failing code path, ask "Why does this fail?" — not "How do I make Sentry quiet?"
These are specific failure modes from real experience. Do NOT do these:
Batch-classifying issues as "expected" without investigating each one. Reading an error message and seeing a fallback path does NOT mean you understand the failure. You must trace the full input path to understand what's being sent and why it fails.
Treating "has a fallback" as "not a problem." A fallback means the user gets degraded results. Ask: why does the primary path fail? Can we prevent the failure upstream? Is the input wrong? Is the timeout too tight? Is there a missing filter?
Combining multiple issues into one "noise reduction" PR. Each issue has its own root cause. Investigate and fix them individually. The only exception is issues that share an identical root cause discovered through investigation.
Throwing away error details. Never change catch (error) { logger.error(..., error) } to catch { logger.info(...) }. The structured error data (status codes, messages, stack traces) is exactly what you need to understand the failure.
Deciding the fix during triage. The triage table should classify issues as "Investigate" or "Ignore" — never pre-decide that the fix is a log level change. You don't know the fix until you've completed investigation.
A downgrade to logger.info is valid ONLY for genuinely expected operational states — NOT for failures with fallbacks. Examples:
Use Sentry MCP to find the org, project, and all unresolved issues. Use ToolSearch first to load the Sentry MCP tools.
mcp__sentry__find_organizations()
mcp__sentry__find_projects(organizationSlug, regionUrl)
mcp__sentry__search_issues(
organizationSlug, projectSlugOrId, regionUrl,
naturalLanguageQuery: "all unresolved issues sorted by events",
limit: 25
)
Build a triage table. The Action column should be Investigate or Ignore — never a pre-decided fix:
| ID | Title | Events | Action | Reason |
|----|-------|--------|--------|--------|
| PROJ-A | Error in save | 14 | Investigate | User-facing save failure |
| PROJ-B | GM_register... | 3 | Ignore | Greasemonkey extension |
Classify every issue before writing any code. Only two categories at this stage:
GM_registerMenuCommand, CONFIG, currentInset, MetaMask JSON-RPC)ChunkLoadError — self-resolving)Apply triage decisions:
mcp__sentry__update_issue(issueId, organizationSlug, regionUrl, status: "ignored") // noise
mcp__sentry__update_issue(issueId, organizationSlug, regionUrl, status: "resolved") // already fixed
For each "Investigate" issue, work through these steps in order. Do NOT skip steps or batch multiple issues together.
Issue summaries hide the details you need. Always pull actual events AND the full issue details:
mcp__sentry__get_issue_details(issueId, organizationSlug, regionUrl)
mcp__sentry__search_issue_events(
issueId, organizationSlug, regionUrl,
naturalLanguageQuery: "all events with extra data",
limit: 15
)
Extract from the events: actual URLs, request parameters, stack traces, timestamps, user context, extra data fields (status codes, content lengths, etc.). These are the real inputs that triggered the failure.
Axiom events include traceId fields that correlate with Sentry errors. Use the Axiom CLI to pull surrounding logs for richer context:
# Get the traceId from the Sentry event's trace context
# Then query Axiom for all events with that traceId
axiom query "['shiori-events'] | where traceId == '<traceId>'" -f json
# Or search by userId around the error timestamp for broader context
axiom query "['shiori-events'] | where userId == '<userId>' | where _time > datetime('2025-01-01T00:00:00Z') and _time < datetime('2025-01-01T01:00:00Z')" -f json
Axiom logs include fields like authMethod, client_version, event type, and request metadata that Sentry often lacks. This helps you understand what the user was doing before and after the error.
Follow the stack trace. Read every file in the chain. Understand what the code does before proposing changes. Use subagents for parallel file exploration if the stack is deep.
This is the step most often skipped, and the most important:
Use the actual failing inputs from Sentry events:
fetch() the actual URLs that timed out — are they reachable?console.log statements to verify your understanding of the code flowAsk these questions in order:
Common root causes:
| Pattern | Root Cause | Real Fix | |---------|-----------|----------| | External API fails on certain URLs | Wrong inputs being sent (binary files, bad formats) | Filter/validate inputs before sending | | External API timeout | Timeout too tight, or input too large, or missing retry | Investigate what's slow, adjust timeout or input size | | DB rejects "invalid json" | Unsanitized input (null bytes, control chars) | Sanitize before insert | | Processing stuck in "error" | Timeout budget doesn't account for full pipeline | Adjust timeouts, save partial results on timeout | | Same error on every cron run | Stale reference to deleted external resource | Detect staleness, auto-clean | | Error logged but details not useful | Error object not included, or status code missing | Improve the log to include actionable details |
Log levels control what reaches Sentry:
| Level | Sends to Sentry? | Use for |
|-------|-------------------|---------|
| logger.error | Yes (error) | Unexpected bugs, states that should never occur |
| logger.warn | Yes (warning) | Handled failures worth monitoring — keep until you understand the pattern |
| logger.info | No | Genuinely expected operational states (not "failures with fallbacks") |
git checkout main && git pull
git checkout -b fix/<descriptive-name>
One branch per issue. Keep fixes focused.
Tests must use data derived from actual Sentry events, not hypothetical inputs. The test should fail before the fix and pass after.
Fix the root cause, not the symptom.
Self-check before committing: If the fix is primarily a log level change, STOP. Ask yourself:
bun run test)console.log statementsgit push -u origin fix/<descriptive-name>
gh pr create --title "<short title>" --body "$(cat <<'EOF'
## Summary
- **Root cause**: [What was actually wrong — the upstream reason, not just "it throws an error"]
- **Fix**: [What changed and why this prevents the failure, not just silences it]
## Test plan
- [x] Tests written using data from Sentry events
- [x] All tests pass
- [x] Lint passes
EOF
)"
After PR is merged:
git checkout main && git pull
mcp__sentry__update_issue(issueId, organizationSlug, regionUrl, status: "resolved")
Work through issues by priority (most events first). After each PR:
[ ] Pulled event-level data (not just issue summary)
[ ] Cross-referenced with Axiom logs using traceId for surrounding context
[ ] Read the failing code path end-to-end
[ ] Traced the input path upstream — understood what data triggers the failure
[ ] Identified root cause (not just "it has a fallback")
[ ] Fix prevents the failure, not just suppresses the log
[ ] Tests use real-world data from Sentry events
[ ] Tests pass, lint passes
[ ] No error details thrown away (catch variables, status codes, etc.)
[ ] PR created with upstream root cause explanation
[ ] Sentry issue resolved after merge
development
Workflow orchestration for complex coding tasks. Use for ANY non-trivial task (3+ steps or architectural decisions) to enforce planning, subagent strategy, self-improvement, verification, elegance, and autonomous bug fixing. Triggers: multi-step implementation, bug fixes, refactoring, architectural changes, or any task requiring structured execution.
development
Test-driven development with red-green-refactor loop. Use when user wants to build features or fix bugs using TDD, mentions "red-green-refactor", wants integration tests, or asks for test-first development.
tools
Guide for creating effective skills. This skill should be used when users want to create a new skill (or update an existing skill) that extends Claude's capabilities with specialized knowledge, workflows, or tool integrations.
development
Simplify and refine recently modified code for clarity and consistency. Use after writing code to improve readability without changing functionality.