skills/debugging/SKILL.md
Use when debugging bugs, test failures, or unexpected behavior. Triggers: 'why isn't this working', 'this doesn't work', 'X is broken', 'something's wrong', 'getting an error', 'exception in', 'stopped working', 'regression', 'crash', 'hang', 'flaky test', 'intermittent failure', or when user pastes a stack trace/error output. NOT for: test quality issues (use fixing-tests), adding new behavior (use develop).
npx skillsauth add axiomantic/spellbook debuggingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
<ROLE>Senior Debugging Specialist. Reputation depends on finding root causes, not applying band-aids.</ROLE>
verifying-hunches skill. Eureka is hypothesis until tested.isolated-testing skill before ANY experiment execution.| Invocation | Triage | Methodology | Verification |
|------------|--------|-------------|--------------|
| debugging | Yes | Selected from triage | Auto |
| debugging --scientific | Skip | Scientific | Auto |
| debugging --systematic | Skip | Systematic | Auto |
| scientific-debugging skill | Skip | Scientific | Manual |
| systematic-debugging skill | Skip | Systematic | Manual |
fix_attempts: 0 // Tracks attempts in this session
current_bug: null // Symptom description
methodology: null // "scientific" | "systematic" | null
baseline_established: false // Is clean baseline confirmed?
bug_reproduced: false // Has bug been reproduced on clean baseline?
code_state: "unknown" // "clean" | "modified" | "unknown"
Reset on: new bug, explicit request, verified fix.
If you find yourself debugging without having completed this phase, STOP IMMEDIATELY and return here. </CRITICAL>
Before ANY investigation, establish a known-good reference state.
BASELINE CHECKLIST:
[ ] What is the "clean" state? (upstream main, last known working commit, fresh install)
[ ] Can I reach that clean state?
[ ] What does "working correctly" look like on clean state?
[ ] Have I tested clean state to confirm it works?
If working with external code (upstream repo, dependency):
git stash # Save local changes
git checkout main # Or upstream branch
git pull # Get latest
# Build/run from clean state and verify expected behavior works
Record the baseline:
BASELINE ESTABLISHED:
- Reference: [commit SHA / version / state description]
- Verified working: [yes/no + what you tested]
- Date: [timestamp]
Set baseline_established = true, code_state = "clean" in session state.
"Someone reported X" is not reproduction. "I think I saw Y" is not reproduction. "The code looks wrong" is not reproduction.
Reproduction means: You personally observed the failure, on a known code state, with a specific test. </CRITICAL>
Reproduction requirements:
BUG REPRODUCTION:
- Code state: [clean baseline from 0.1]
- Steps to reproduce:
1. [exact step]
2. [exact step]
3. [exact step]
- Expected: [what should happen]
- Actual: [what actually happened - paste output]
- Reproduced: [YES / NO]
Set bug_reproduced = true in session state on successful reproduction.
If bug does NOT reproduce on clean baseline:
BUG NOT REPRODUCED on clean baseline.
Options:
A) The bug doesn't exist (or is already fixed)
B) Reproduction steps are incomplete
C) Bug is environment-specific
DO NOT proceed to investigation. Either:
- Refine reproduction steps
- Check if bug was already fixed
- Investigate environment differences
Before EVERY test, verify:
CODE STATE CHECK:
- Am I on clean baseline? [yes/no]
- What modifications exist? [list changes]
- Is this the state I INTEND to test? [yes/no]
<FORBIDDEN>
- Testing without knowing code state
- Making changes and forgetting what you changed
- Assuming you're on clean state without verifying
- "Let me try this change" without recording it
</FORBIDDEN>
Ask via AskUserQuestion:
AskUserQuestion({
questions: [
{
question: "What's the symptom?",
header: "Symptom",
options: [
{ label: "Clear error with stack trace", description: "Error message points to specific location" },
{ label: "Test failure", description: "One or more tests failing" },
{ label: "Unexpected behavior", description: "Code runs but does wrong thing" },
{ label: "Intermittent/flaky", description: "Sometimes works, sometimes doesn't" },
{ label: "CI-only failure", description: "Passes locally, fails in CI" }
]
},
{
question: "Can you reproduce it reliably?",
header: "Reproducibility",
options: [
{ label: "Yes, every time" },
{ label: "Sometimes" },
{ label: "No, happened once" },
{ label: "Only in CI" }
]
},
{
question: "How many fix attempts already made?",
header: "Prior attempts",
options: [
{ label: "None yet" },
{ label: "1-2 attempts" },
{ label: "3+ attempts" }
]
}
]
})
ALL must be true:
If SIMPLE:
This appears to be a straightforward bug:
[Error]: [specific error message]
[Location]: [file:line]
[Fix]: [obvious fix]
Applying fix directly without methodology.
[Apply fix]
[Proceed to Phase 4 Verification]
Otherwise: Proceed to 1.3
If prior attempts = "3+ attempts":
<THREE_FIX_RULE_WARNING>
You've attempted 3+ fixes without resolving this issue.
Tactical fixes cannot solve architectural problems.
Options:
A) Stop - conduct architecture review with human
B) Continue (type "I understand the risk, continue")
C) Escalate to human architect
D) Create spike ticket
</THREE_FIX_RULE_WARNING>
Wait for explicit choice. If B chosen: reset fix_attempts = 0, proceed.
Signs of architectural problem (stop and reassess when present):
Actions when 3-fix rule triggered:
memory_recall(query="architecture issue [component]") to check for documented systemic problems in this area.fractal-thinking with intensity explore and seed: "Why does [symptom] persist after [N] fix attempts targeting [root causes]?" Invoke when stuck generating new hypotheses after 2+ disproven theories. Use synthesis to produce new hypothesis families.Before selecting a debugging methodology, check for relevant stored memories:
<spellbook-memory> context from recent file reads, incorporate it.memory_recall(query="[symptom_type] [component_or_module]") to surface prior root causes, antipatterns, and debugging outcomes for this area.| Symptom | Reproducibility | Route To | |---------|-----------------|----------| | Intermittent/flaky | Sometimes/No | Scientific | | Unexpected behavior | Sometimes/No | Scientific | | Clear error | Yes | Systematic | | Test failure | Yes | Systematic | | CI-only failure | Passes locally | CI Investigation | | Any + 3 attempts | Any | Architecture review |
Test failures: Offer fixing-tests skill as alternative (handles test quality, green mirage):
Test failure detected. Options:
A) fixing-tests skill (Recommended for test-specific issues)
- Handles test quality issues, green mirage detection
B) systematic debugging
- Better when test reveals production bug
Present recommendation with rationale. Respect user choice; warn if user picks B for a test-quality issue (not a production bug) or A when the test is exposing real production behavior.
Invoke selected methodology:
/scientific-debugging for hypothesis-driven investigation/systematic-debugging for root cause tracingChaos indicators (STOP if you catch yourself):
def after_fix_attempt(succeeded: bool):
fix_attempts += 1
if succeeded:
invoke_phase4_verification()
else:
if fix_attempts >= 3:
show_three_fix_warning()
else:
print(f"Fix attempt {fix_attempts} failed.")
print("Returning to investigation with new information...")
When user explicitly requests skipping methodology:
Proceeding with direct fix (methodology skipped).
WARNING: Lower success rate and higher rework risk.
[Attempt fix]
[Increment fix_attempts]
[If fails, return to Phase 2 with updated count]
<RULE>Use when: passes locally, fails in CI; or CI-specific symptoms (cache, env vars, runner limits).</RULE>
| Symptom | Likely Cause | Path | |---------|--------------|------| | Works locally, fails CI | Environment parity | Environment diff | | Flaky only in CI | Resource constraints/timing | Resource analysis | | Cache-related errors | Stale/corrupted cache | Cache forensics | | Permission/access errors | CI secrets/credentials | Credential audit | | Timeout failures | Runner limits | Performance triage | | Dependency resolution fails | Lock file or registry | Dependency forensics |
Capture CI environment (from logs or CI config):
Compare to local:
| Variable | Local | CI | Impact |
|----------|-------|----|--------|
Identify parity violations: Version mismatches, missing env vars, path differences
| Constraint | Symptom | Mitigation | |------------|---------|------------| | Memory limit | OOM killer, exit 137 | Reduce parallelism, larger runner | | CPU throttling | Timeouts, slow tests | Reduce parallelism, increase timeout | | Disk space | "No space left" | Clean artifacts, smaller images | | Network limits | Registry timeouts | Mirrors, retry logic |
[ ] Reproduced exact CI runtime version locally
[ ] Compared environment variables (CI vs local)
[ ] Tested with cache disabled
[ ] Checked runner resource limits
[ ] Verified secrets/credentials are set
[ ] Confirmed network access (registries, APIs)
[ ] Checked for CI-specific code paths (CI=true, etc.)
[ ] After identifying cause: fix in CI config OR add local reproduction instructions
[ ] Document the environment requirement
[ ] Add CI parity check to README/CLAUDE.md
<CRITICAL>Auto-invoke Phase 4 Verification after EVERY fix claim. Not optional.</CRITICAL>
Verification confirms:
If verification succeeds:
Memory Persistence: After confirming a fix, store the root cause for future sessions:
memory_store_memories(memories='{"memories": [{"content": "Root cause: [description]. Fix: [what was changed]. Symptom: [what was observed].", "memory_type": "fact", "tags": ["root-cause", "[component]", "[symptom_type]"], "citations": [{"file_path": "[fixed_file]", "line_range": "[lines]"}]}]}')
For recurring issues (3-fix-rule triggers), also store an antipattern:
memory_store_memories(memories='{"memories": [{"content": "Recurring issue in [component]: [pattern description]. Consider architectural review.", "memory_type": "antipattern", "tags": ["recurring", "[component]", "architecture"], "citations": [{"file_path": "[file]"}]}]}')
If verification fails:
Verification failed. Bug not resolved.
[Show what failed]
Returning to debugging...
[Increment fix_attempts, check 3-fix rule, continue]
Investigation violations:
Methodology violations:
Winging it:
Before starting investigation:
[ ] Phase 0 completed (baseline + reproduction)
[ ] Clean baseline established and recorded
[ ] Bug reproduced on clean baseline with specific steps
[ ] Code state is known and tracked
Before completing debug session:
[ ] Fix attempts tracked throughout session
[ ] 3-fix rule checked if attempts >= 3
[ ] Phase 4 Verification invoked after fix
[ ] User informed of session outcome
[ ] If methodology skipped, warning was shown
[ ] Code returned to clean state (or changes documented)
If NO to any item, go back and complete it.
<reflection> After each debugging session, verify: - Root cause was identified (not just symptom addressed) - Fix was verified with evidence - 3-fix rule was respected </reflection><FINAL_EMPHASIS> Evidence or it didn't happen. Three strikes and you're questioning architecture, not code. Verification is not optional - it's how professionals work. </FINAL_EMPHASIS>
testing
Use when creating new skills, editing existing skills, or verifying skills work before deployment. Triggers: 'write a skill', 'new skill', 'create a skill', 'skill doesn't work', 'skill isn't firing', 'edit skill', 'skill quality'. NOT for: general prompt improvement (use instruction-engineering) or command creation (use writing-commands).
development
Use when you have a spec, design doc, or requirements and need a detailed implementation plan before coding. Triggers: 'write a plan', 'create implementation plan', 'plan this out', 'break this down into steps', 'convert design to tasks', 'implementation order'. Also invoked by develop during planning. NOT for: reviewing existing plans (use reviewing-impl-plans).
testing
Use when creating new commands, editing existing commands, or reviewing command quality. Triggers: 'write command', 'new command', 'create a command', 'review command', 'fix command', 'command doesn't work', 'add a slash command'. NOT for: skill creation (use writing-skills).
development
Use when about to claim discovery during debugging. Triggers: "I found", "this is the issue", "I think I see", "looks like the problem", "that's why", "the bug is", "root cause", "culprit", "smoking gun", "aha", "got it", "here's what's happening", "the reason is", "causing the", "explains why", "mystery solved", "figured it out", "the fix is", "should fix", "this will fix". Also invoked by debugging, scientific-debugging, systematic-debugging before any root cause claim.