skills/diagnose/SKILL.md
Investigate, audit, triage, and fix. Systematic debugging, incident lifecycle, domain auditing, and issue logging. Feedback-loop-first protocol: reproduce or replay before root cause, pattern analysis, hypothesis test, and fix. Use for: any bug, test failure, production incident, error spikes, audit, triage, postmortem, "diagnose", "why is this broken", "debug this", "production down", "is production ok", "audit stripe", "log issues". Trigger: /diagnose.
npx skillsauth add phrazzld/spellbook diagnoseInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Find root cause. Fix it. Prove it works.
You are the executive orchestrator.
harnesses/shared/references/lenses.md) returning finding · evidence file:line · impact — name the lens, no static agent file.delegate on judgment per the shared Roster contract: native subagents
by default; add cross-model critics, roster providers, or sprite lanes
(/sprites) only when they answer a distinct question. See
harnesses/shared/AGENTS.md (Roster).
Local lane guidance: Use independent evidence or hypothesis lanes for competing root causes, reproduction, config/service checks, and proposed fix validation.
| Intent | Sub-capability |
|--------|---------------|
| Debug a bug, test failure, unexpected behavior | This file (below) |
| Need a reproduction strategy | references/feedback-loops.md |
| Flaky test investigation | references/flaky-test-investigation.md |
| Incident lifecycle: triage, investigate, postmortem | references/triage.md |
| Domain audit: "audit stripe", "audit quality" | references/audit.md |
| Audit then fix highest priority issue | references/fix.md |
| Create GitHub issues from audit findings | references/log-issues.md |
If first argument matches a domain name (stripe, quality, etc.), route to references/audit.md.
If "triage", "incident", "postmortem", "production down" → references/triage.md.
If "flaky", "flake", "intermittent", "nondeterministic test" → references/flaky-test-investigation.md.
If "fix" → references/fix.md. If "log issues" → references/log-issues.md.
Otherwise, this is a debugging session — continue below.
The user's symptoms: $ARGUMENTS
NO FIXES WITHOUT A FEEDBACK LOOP AND ROOT CAUSE FIRST
If you cannot run, replay, or inspect a signal that proves the symptom is present, you cannot verify a fix. Build the smallest believable pass/fail loop before hypothesizing deeply. If no loop is possible with current access, stop and ask for a captured artifact, environment access, or temporary instrumentation.
External service issues are usually config, not code. Check in order:
npx convex env list --prod | grep <SERVICE> or vercel env lscurl -I -X POST <webhook_url>Spawn at least two roster-backed Explore lanes to gather evidence. Tell them to investigate the symptoms, reproduce the issue, trace data flow, and report back with root cause + evidence + proposed fix. They should NOT implement the fix — just report. You review, decide if root cause is proven, then dispatch a builder for the fix or dig deeper.
When >2 plausible root causes and a single investigation would anchor on one: spawn parallel Explore subagents, one per hypothesis. Each gets one hypothesis to prove or disprove by tracing a specific subsystem. They report back with confirmed/disproved + evidence. You synthesize into a consensus root cause, then dispatch a builder (general-purpose) for the fix.
Use when: ambiguous stack trace, multiple services, flaky failures. Don't use when: obvious single cause, config issue, simple regression.
| You (lead) | Sub-agents (investigators) | |------------|---------------------------| | Ranking hypotheses | Tracing one subsystem | | Declaring root cause proven | Comparing working vs broken | | Choosing the fix | Gathering logs and reproductions | | Deciding when evidence is sufficient | Running targeted test cases |
When you can't reproduce the bug yourself (auth-gated, mobile, timing-dependent, hardware-specific, user-flow-dependent):
INSTRUMENT → USER REPRODUCES → READ LOGS → REFINE → REPEAT
LOG_FILE="${HOME}/Desktop/debug-$(date +%s).log"
Log at decision points: function entry/exit, branch taken, values at boundaries.
Tag each log line with the hypothesis it tests: [H1] auth token expired: ${token.exp}Use when: flaky tests, user-reported bugs you can't trigger, environment-specific issues. Don't use when: bug reproduces in your environment (use the main debugging phases directly).
The feedback loop is the first deliverable. Choose the narrowest loop that reproduces the user's symptom, not a nearby failure.
references/feedback-loops.md.Only after a loop exists:
git diff, git log --oneline -10, new deps, configScientific method. Rank 3-5 hypotheses when the cause is not obvious. Test one prediction at a time. No stacking.
Never skip justification. "Just try X" is a red flag — if you can't explain what you'll learn from an experiment, you don't understand the problem yet.
/critique --lens ousterhout pass) rather than forcing a bad seam.references/systematic-debugging.md.For each hypothesis, categorize:
Post-fix question: "If we revert in 6 months, does the problem return?"
Before declaring "fixed", show:
Mark as UNVERIFIED until observables confirm.
| Type | Signals | Approach | |------|---------|----------| | Test failure | Assertion error | Read test, trace expectation | | Runtime error | Exception, crash | Stack trace -> source -> state | | Type error | TS complaint | Read error, check types | | Build failure | Bundler error | Check deps, config | | Behavior mismatch | "Does Y, should do X" | Trace code path | | Performance | Slow, timeout | Add timing instrumentation | | Production incident | Incident tracker, alerts | Create INCIDENT.md, timeline |
For non-trivial production issues, create INCIDENT-{timestamp}.md:
Semantic waiver: diagnosis quality is tied to the concrete failure and cannot be proven by a single static fixture. Each run must cite reproduction evidence, the hypothesis test, the fix path, and the exact command or runtime surface that proves the symptom is gone.
tools
Enumerates the peer AI agent CLIs installed on this machine (codex, claude, pi, opencode, cursor-agent, grok, agy, hermes, thinktank) and how to invoke each headlessly. A capability map, not a quota: useful for fresh-context adversarial review on a different model family, second opinions, competing attempts, and wide benches. Use when: "ask codex", "ask another model", "second opinion", "cross-model review", "what AI tools do I have", "other agents", "different model family", "adversarial critique from another provider". Trigger: /roster.
development
Run lane cards on Fly Sprites: remote, isolated, scale-to-zero sandboxes for heavy or parallel agent work. Golden-checkpoint provisioning so lanes start on a ready sprite with zero setup tokens. Use when: "run this on a sprite", "remote lane", "offload to a sandbox", "dispatch to sprites", "bake a sprite", "sprite fleet", heavy/long-running/parallel sub-agent work that should not run on this machine. Trigger: /sprites, /sprite-lane.
testing
Compose and launch roster-backed specialist lanes with prompt-native lane cards and receipts. Use when: "dispatch agents", "use subagents", "compose a team", "run provider lanes", "make lane cards". Trigger: /dispatch, /subagents, /lanes.
tools
Fast session-start repository orientation from live local evidence. Use when: "orient yourself", "start of session", "new session", "where are we", "catch me up before acting", "what should I do next", after compaction, after switching worktrees, or before choosing a Harness Kit workflow. Trigger: /orient, /ground, /session-start.