assets/skills/fix/root-cause/SKILL.md
Deep bug investigation workflow for hard-to-trace errors. Systematic root cause analysis — no guessing, no blind fixes. Use when user says "deep debug", "deep-debug", "trace bug", "find root cause", "hard bug", "investigate bug".
npx skillsauth add phuthuycoding/moicle fix-root-causeInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
For bugs that have been "fixed" multiple times and keep coming back, intermittent bugs, or errors deep inside vendor code. Do not guess — trace step by step.
/fix-hotfix/fix-incident first, then this skillCOLLECT → VERIFY → TRACE → ROOT CAUSE → CHECK HIDDEN STATE → FIX → VERIFY
Time budget: if you've traced for >2 hours with no convergence → stop, write down what you know, ask a teammate. Rabbit holes are real.
Record verbatim, do not interpret:
Do NOT assume prod = local.
git diff deployed vs localThe single most important step. Skip nothing.
Use the right tool, not just console.log:
| Tool | When |
|------|------|
| Debugger breakpoint | Step through suspect code, inspect locals |
| Conditional breakpoint | Stop only when x == nil / id == 42 |
| Logpoint (debugger) | Inject log without modifying source — useful in prod-like envs |
| Memory / heap profiler | Suspected memory leak or unbounded growth |
| CPU profiler | Slow path / hot loop |
| Network trace (Wireshark, browser devtools) | Wire-level issue with external API |
| strace / dtruss | Syscall-level (file, network) on Unix |
| Time-travel debugger (rr, Replay) | Hard non-determinism, race conditions |
| Distributed tracing (OpenTelemetry, Jaeger) | Cross-service latency or error origin |
If you cannot answer all 3 → return to Step 3.
"Sometimes works, sometimes doesn't" bugs live here.
WHERE tenant_id = ?Only after Step 4 is answered. The fix MUST:
~/.claude/architecture/ddd-architecture.md)For data-shape bugs from external sources: validate / coerce at the adapter, not inside business logic.
## Bug: {short description}
### Root Cause
{Specific technical cause from Step 4 Q1}
### Why it didn't fail before
{What changed: deploy / dependency / data shape / config / traffic / runtime}
### Reproduction
{Exact steps + data + environment}
### Hidden state source
{Cache / DB / Concurrency / Runtime / Env / Multi-tenant — or "none"}
### Fix
- File: `path/to/file:line`
- Layer: {handler / usecase / infra adapter}
- Approach: {boundary defense / type coercion / locking / cache invalidation / etc.}
- Why this is root-cause, not symptom: {explanation}
### Regression test
- File: `path/to/test:line`
- Reproduces the original failure when fix removed
### Verification
- [x] Original failure no longer reproducible
- [x] Normal path works
- [x] Cached path works (if applicable)
- [x] Load tested (if concurrency-related)
| When | Use |
|------|-----|
| Bug is understood, just needs a fix | /fix-hotfix |
| Production is down | /fix-incident (then this skill after mitigation) |
| Write regression test after fix | /review-tdd |
| Research how others solved similar bugs | /research-web |
| Phase | Agent | Purpose |
|-------|-------|---------|
| Trace | @code-reviewer | Independent reading of call chain |
| Hidden state — DB | @db-designer | Schema, indexes, collation, replication |
| Hidden state — concurrency | @perf-optimizer | Race, deadlock, async leak |
| Hidden state — security/tenancy | @security-audit | Tenant isolation, auth context |
| Fix | Stack-specific dev agent | Boundary-defensive fix |
| Verify | @test-writer | Regression test |
development
Test-Driven Development workflow. Use when doing TDD, writing tests first, or when user says "tdd", "test first", "test driven", "red green refactor".
development
Thorough pull request review workflow with architecture compliance checks. Use when reviewing pull requests, checking code changes, or when user says "review pr", "check pr", "review code", "pr review", "review pull request".
development
Review local branch changes for architecture compliance, conventions, and code quality before pushing/PR. Stack-aware — detects the project stack and applies the matching rules. Use when user says "review changes", "review branch", "check branch", "check changes", "review my code", "review before pr".
testing
DDD architecture compliance review with automated checks and review loop. Use when user says "architect-review", "architecture review", "review architecture", "check architecture", "review ddd", "ddd review".