skills/debug-ops/SKILL.md
Systematic debugging methodology, language-specific debuggers, and common scenario playbooks. Use for: debug, debugging, bug, crash, hang, memory leak, race condition, deadlock, bisect, reproduce, root cause, breakpoint, profiling, performance issue, segfault, stack trace, core dump.
npx skillsauth add 0xDarkMatter/claude-mods debug-opsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Systematic debugging methodology with language-specific tooling and common scenario playbooks.
Bug Report / Symptom
│
├─ Crash
│ ├─ Segfault / Access Violation
│ │ └─ Check: null pointer, buffer overflow, use-after-free, stack overflow
│ ├─ Panic / Fatal Error
│ │ └─ Check: assertion failure, unrecoverable state, out-of-memory
│ └─ Unhandled Exception
│ └─ Check: missing error handler, unexpected input type, network failure
│
├─ Hang
│ ├─ Deadlock
│ │ └─ Check: lock ordering, mutex contention, channel blocking
│ ├─ Infinite Loop
│ │ └─ Check: loop termination condition, counter overflow, recursive call
│ └─ Blocked I/O
│ └─ Check: network timeout, DNS resolution, disk full, file lock
│
├─ Wrong Output
│ ├─ Logic Error
│ │ └─ Check: operator precedence, boundary conditions, boolean logic
│ ├─ Data Corruption
│ │ └─ Check: concurrent mutation, encoding mismatch, truncation
│ └─ Off-by-One
│ └─ Check: loop bounds, array indexing, fence-post errors
│
├─ Performance
│ ├─ Slow Queries
│ │ └─ Check: missing index, N+1 queries, full table scan, lock wait
│ ├─ Memory Bloat
│ │ └─ Check: cache without eviction, leaked references, large allocations
│ └─ CPU Spikes
│ └─ Check: hot loops, regex backtracking, excessive GC, busy-wait
│
└─ Intermittent
├─ Race Condition
│ └─ Check: shared mutable state, read-modify-write, check-then-act
├─ Timing-Dependent
│ └─ Check: timeout values, clock skew, event ordering assumptions
└─ Environment-Specific
└─ Check: OS differences, locale, timezone, file system case sensitivity
Six-step process from symptom to prevention:
Confirm the bug exists and create a reliable reproduction. A bug you cannot reproduce is a bug you cannot confidently fix. Capture exact inputs, environment, and sequence of operations.
Narrow the fault to the smallest possible scope. Use binary search (git bisect, commenting out code halves), stubs, feature flags, and environment isolation to eliminate innocent code.
Find the root cause, not just the proximate trigger. Use the 5 Whys technique, trace execution, inspect state at key points. Distinguish between the symptom and the underlying defect.
Apply the minimal correct change that addresses the root cause. Avoid shotgun debugging (changing multiple things at once). Understand why the fix works, not just that it works.
Confirm the fix resolves the original issue without introducing regressions. Re-run the original reproduction case. Run the full test suite. Test edge cases related to the fix.
Add a regression test. Update documentation or runbooks if applicable. Consider whether the same class of bug could exist elsewhere. Share findings with the team.
[ ] Minimal reproduction steps documented (numbered, unambiguous)
[ ] Environment captured (OS, runtime version, dependencies, config)
[ ] Exact inputs recorded (request payload, CLI args, file contents)
[ ] Timing sensitivity assessed (does it fail only under load? after delay?)
[ ] Single-threaded reproduction attempted (eliminates concurrency noise)
[ ] Reproduction automated as script or test case
[ ] Confirmed reproduction is deterministic (fails N/N attempts)
[ ] Identified whether reproduction requires specific data/state
| Technique | Method | Best For |
|-----------|--------|----------|
| Binary search (git) | git bisect start BAD GOOD then git bisect run ./test.sh | Finding which commit introduced the bug |
| Binary search (code) | Comment out half the code, test, repeat | Narrowing fault location in unfamiliar code |
| Stubs/Mocks | Replace dependencies with known-good fakes | Isolating from external services |
| Feature flags | Toggle features off one by one | Finding which feature causes the issue |
| Environment isolation | Docker container, fresh VM, clean install | Eliminating environment contamination |
| Network interception | mitmproxy, Charles Proxy, mock server | Isolating client vs server issues |
| Input reduction | Remove input fields/data until bug disappears | Finding minimal trigger |
| Dependency pinning | Lock all deps, update one at a time | Finding breaking dependency update |
Problem: API returns 500 error on user login
1. Why? → The database query throws a timeout exception
2. Why? → The users table scan takes >30 seconds
3. Why? → There is no index on the email column
4. Why? → The migration that adds the index was never run in production
5. Why? → The deployment script skips migrations when the --fast flag is used
Root cause: Deployment script's --fast flag bypasses migrations
Fix: Remove --fast flag behavior that skips migrations, add migration check to health endpoint
Prevention: CI check that verifies all migrations are applied after deployment
[System Failure]
/ \
[Hardware] [Software]
/ \ / \
[Disk] [Memory] [Config] [Code Bug]
| |
[Missing [Race in
env var] worker pool]
Work from the top (observed failure) down to leaves (root causes). Each branch is an AND/OR gate -- AND means all children must be true, OR means any one child suffices.
| Language | Tool | Launch Command | Key Commands |
|----------|------|----------------|--------------|
| Node.js | Chrome DevTools | node --inspect-brk app.js | Open chrome://inspect, set breakpoints in Sources |
| Node.js | ndb | npx ndb app.js | Enhanced DevTools with blackboxing |
| Python | pdb | python -m pdb script.py | n next, s step, c continue, p expr print, bt backtrace |
| Python | debugpy | python -m debugpy --listen 5678 --wait-for-client script.py | VS Code "Attach" launch config |
| Python | breakpoint() | Insert breakpoint() in code | Drops into pdb at that line (Python 3.7+) |
| Go | Delve | dlv debug ./cmd/server | b main.go:42 break, c continue, n next, p var print |
| Go | Delve (test) | dlv test ./pkg/... | Debug test functions directly |
| Go | Delve (attach) | dlv attach PID | Debug running process |
| Rust | rust-gdb | rust-gdb target/debug/myapp | b main, r, n, p variable, bt |
| Rust | rust-lldb | rust-lldb target/debug/myapp | b s main, r, n, p variable, bt |
| Rust | CodeLLDB | VS Code extension | GUI breakpoints, variable inspection |
| Browser | DevTools | F12 or Ctrl+Shift+I | Elements, Console, Network, Sources, Performance, Memory |
// Node.js: drop into debugger at this point
debugger;
// Node.js: conditional breakpoint
if (user.id === 'problem-user') debugger;
# Python: drop into debugger at this point
breakpoint()
# Python: conditional breakpoint
if user_id == 'problem-user':
breakpoint()
// Go: print goroutine stacks (send SIGQUIT or SIGABRT)
// kill -QUIT <pid>
// Or in code:
import "runtime/debug"
debug.PrintStack()
// Rust: enable full backtraces
// RUST_BACKTRACE=1 cargo run
// RUST_BACKTRACE=full cargo run
Place logs at decision points, not just error paths:
[ENTRY] function_name(args_summary) -- entering the function
[STATE] key_variable=value -- state at critical decision point
[BRANCH] taking path X because Y -- which branch and why
[EXIT] function_name -> result_summary -- leaving the function
[ERROR] operation failed: detail -- error with context
Trace a single request across services:
# Generate at entry point, propagate through all calls
X-Request-ID: 550e8400-e29b-41d4-a716-446655440000
# Search across all service logs
rg "550e8400-e29b-41d4-a716-446655440000" /var/log/services/
# Merge and sort logs from multiple sources by timestamp
sort -t' ' -k1,2 service-a.log service-b.log service-c.log > timeline.log
# Find gaps in activity (potential hang/block)
awk '{print $1, $2}' timeline.log | uniq -c | sort -rn | head -20
# jq queries on JSON logs
# Find all errors for a specific user
jq 'select(.level == "error" and .user_id == "u123")' app.log
# Get timing distribution for slow requests
jq 'select(.duration_ms > 1000) | .duration_ms' app.log | sort -n
# Count errors by type
jq -r 'select(.level == "error") | .error_type' app.log | sort | uniq -c | sort -rn
| Gotcha | Why It Hurts | Fix |
|--------|-------------|-----|
| Fixing symptoms, not root cause | Bug resurfaces in a different form | Use 5 Whys to dig deeper |
| Debugging in production without safety net | Risk of data loss or extended outage | Use read-only queries, feature flags, canary deploys |
| Heisenbug (disappears under observation) | Adding logging/breakpoints changes timing | Use non-invasive tools: strace, sampling profiler, rr |
| Assumption bias ("it can't be X") | Skipping the actual cause because you trust it | Test every assumption explicitly, even "obvious" ones |
| Missing reproduction case | Cannot verify fix, cannot prevent regression | Invest time upfront in reliable reproduction |
| Over-relying on print/log debugging | Slow iteration, pollutes code, misses concurrency bugs | Use proper debugger, profiler, or tracing tool |
| Not checking recent changes | The answer is often in the last few commits | git log --oneline -20, git diff HEAD~5 |
| Ignoring warning messages | Warnings often predict the error that follows | Treat warnings as errors during debugging |
| Debugging wrong version/branch | Wasting time on already-fixed or different code | Verify git branch, git log -1, runtime version |
| Not reading the full stack trace | Root cause is often in the middle, not the top | Read bottom-up: find your code in the trace first |
| Changing multiple things at once | Cannot tell which change fixed (or broke) it | One change per test cycle |
| Not capturing the "before" state | Cannot diff against working baseline | Snapshot config, deps, data before debugging |
| File | Contents | Lines |
|------|----------|-------|
| references/systematic-methods.md | Scientific method, binary search, delta debugging, differential debugging, time-travel debugging, team debugging | ~600 |
| references/tool-specific.md | Browser DevTools, Node.js, Python, Go, Rust, database, network, Docker debugging tools | ~650 |
| references/common-scenarios.md | Memory leaks, deadlocks, race conditions, performance regressions, API debugging, deployment issues | ~550 |
tools
Behavioural-first software supply chain defense - catches poisoned npm/PyPI packages in the publish-to-advisory window that CVE tools miss. Use BEFORE every install or version bump (not only when an attack is suspected) - the 7-day cooldown gate + behavioural score catches freshly-published malware that CVE tools won't see for days. Socket.dev integration (free CLI + GitHub app + depscore MCP for Claude Code), stale-OIDC audit, dependency cooldown policy, publish-token rotation, VS Code extension audit, and a self-integrity scan that detects worm persistence hooks injected into Claude Code / VS Code settings. Triggers on: pip install, uv add, uv tool install, npm install, pnpm add, yarn add, cargo add, go get, composer require, gem install, upgrade dependency, dependency upgrade, version bump, bump version, bump package, adding dependency, new dependency, vetting a dependency, vet package, is this package safe, safe to install, should I install, before installing, pre-install check, preinstall scan, preinstall-check, PyPI cooldown, npm cooldown, release cooldown, minimumReleaseAge, score a package, package score, depscore, socket score, supply chain, supply chain attack, malicious package, poisoned dependency, npm worm, Shai-Hulud, behavioural scanning, Socket.dev, socket scan, dependency security, postinstall malware, OIDC token theft, compromised maintainer, typosquat, dependency confusion, package provenance, SLSA, persistence hook, malicious VS Code extension.
testing
GitHub remote operations — repo creation, metadata (description/homepage/topics), releases, README 'Recent Updates' enforcement, and issue / PR management with preview-before-send discipline. Companion to git-ops (local) and push-gate (pre-push safety). Three modes: new (first publish), update (subsequent release), audit (read-only checklist), plus atomic operations for issues and PRs. Triggers on: push to github, publish repo, ship release, cut release, gh release, set topics, repo description, github metadata, recent updates section, audit github repo, repo visibility, make repo public, gh repo create, gh issue, gh pr, create issue, comment on issue, close issue, triage issue, create PR, review PR, merge PR, pre-merge check, pr checks.
tools
Defend the agent's instruction surface against adversarial content - hidden-Unicode prompt injection (Trojan Source bidi reordering, U+E0000 tag-block ASCII smuggling, zero-width text), homoglyph confusables, and poisoned context that a human reviewer can't see but the model obeys. Scan CLAUDE.md / AGENTS.md / SKILL.md / .cursorrules and MCP tool descriptions; sanitize fetched web pages, issue/PR bodies, and dependency READMEs before they enter context. Triggers on: prompt injection, hidden unicode, invisible characters, zero-width space, bidi override, Trojan Source, ASCII smuggling, tag characters, homoglyph, confusable, unicode steganography, poisoned CLAUDE.md, malicious tool description, MCP tool poisoning, instruction injection, jailbreak in file, is this file safe, sanitize untrusted content, scan for hidden text.
tools
Set tool permissions for Claude Code. Configures allowed commands, rules, and preferences in .claude/ directory. Triggers on: setperms, init tools, configure permissions, setup project, set permissions, init claude.