skills/debug-investigator/SKILL.md
Hypothesis-driven debugging with ranked hypotheses, git bisect strategy, instrumentation planning, and minimal reproduction design. Triggers on: "debug this systematically", "root cause analysis", "bisect this bug", "rank hypotheses", "isolate this issue", "minimal reproduction". NOT for general reasoning.
npx skillsauth add mathews-tom/armory debug-investigatorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Structured debugging methodology that replaces ad-hoc exploration with hypothesis-driven investigation. Captures symptoms, builds a deterministic feedback loop, analyzes evidence (stacktraces, logs, state), generates ranked hypotheses, designs bisection strategies, identifies instrumentation points, and produces minimal reproductions — documenting every step so dead ends are never revisited.
When to use this skill vs native debugging: The base model handles straightforward debugging (clear stacktraces, obvious errors) natively. Use this skill for non-obvious bugs requiring systematic investigation: intermittent failures, bugs with no clear stacktrace, performance regressions, or issues requiring git bisection and hypothesis ranking.
| File | Contents | Load When |
| -------------------------------------- | ----------------------------------------------------------------------------- | ------------------------------- |
| references/stacktrace-patterns.md | Exception taxonomy, traceback reading, common Python/JS error signatures | Stacktrace or exception present |
| references/hypothesis-templates.md | Bug category catalog, probability ranking, confirmation/refutation tests | Always |
| references/bisection-guide.md | git bisect workflow, binary search debugging, narrowing techniques | Bug appeared after a change |
| references/log-analysis.md | Log pattern extraction, anomaly detection, timeline correlation | Log output available |
| references/instrumentation-points.md | Strategic logging placement, breakpoint strategy, state inspection techniques | Investigation plan needed |
Before deep investigation, check for repo-local agent context:
docs/agents/domain.md for CONTEXT.md, CONTEXT-MAP.md, and ADR lookup rulesCONTEXT.md or relevant context-local glossary for domain vocabularydocs/adr/ and context-local ADRs for decisions near the failing areaUse the project glossary in hypotheses, repro names, and prevention recommendations. If the repo lacks these files, continue normally; do not block debugging on context setup.
Before touching code, document the observable problem:
KeyError('user_id') on line 42 of auth.py when calling
get_current_user() with a valid session token" is actionable.git log --oneline -20.
If the bug appeared after a specific commit, bisection is the fastest path.Create a fast, deterministic pass/fail signal for the reported bug before ranking hypotheses or changing production code. The loop must reproduce the user's symptom, not a nearby failure.
Try these seams in order:
git bisect run harness when the bug appeared between known good and bad revisions.Improve the loop before moving on:
If no credible loop can be built, stop and state what was tried. Request the missing artifact: environment access, captured payloads, logs, screen recording with timestamps, or permission for temporary instrumentation. Do not proceed to speculative fixes.
Examine all available evidence before forming hypotheses:
Stacktrace interpretation — If a traceback exists, read it bottom-up. The last frame is where the error manifested, but the cause is often several frames up. Identify:
references/stacktrace-patterns.md)Log pattern extraction — Search logs for:
State inspection — If the system is running, inspect:
Code diff analysis — If the bug is recent:
git diff HEAD~5 — what changed?Generate ranked hypotheses — never start fixing without a hypothesis:
List 3-5 hypotheses ranked by likelihood. Each hypothesis must include:
Rank by likelihood using:
Common bug categories (see references/hypothesis-templates.md):
Design specific steps to test each hypothesis:
git bisect start <bad> <good>references/bisection-guide.md for workflowreferences/instrumentation-points.mdExecute the investigation plan, updating hypotheses as evidence arrives:
After finding the root cause:
## Debug Investigation: {Brief Description}
### Symptom
**Observed:** {What is happening — precise description}
**Expected:** {What should happen}
**Reproducibility:** {Always | Intermittent (~N% of attempts) | Once}
**First noticed:** {Date/time or triggering event}
**Environment:** {Relevant versions and configuration}
### Evidence Analysis
#### Stacktrace
- **Exception:** {type}: {message}
- **Origin:** {file}:{line} in {function}
- **Call chain:** {caller} → {caller} → {failure point}
- **Key insight:** {What the traceback reveals about the cause}
#### Logs
- **Anomaly:** {What is unusual}
- **Timeline:** {When the anomaly started}
- **Correlation:** {Related events}
#### Code Changes
- **Recent commits:** {relevant commits since last known-good state}
- **Files in error path:** {which changed files appear in the traceback}
### Hypotheses
| # | Hypothesis | Likelihood | Confirming Test | Refuting Test |
|---|------------|------------|-----------------|---------------|
| H1 | {Specific claim} | High | {What to check} | {What would disprove} |
| H2 | {Specific claim} | Medium | {What to check} | {What would disprove} |
| H3 | {Specific claim} | Low | {What to check} | {What would disprove} |
### Investigation Plan
#### Step 1: Test H1 — {action}
- **Command/action:** {specific step}
- **If confirmed:** {next action — fix}
- **If refuted:** proceed to Step 2
#### Step 2: Bisection
- **Good commit:** {hash}
- **Bad commit:** {hash}
- **Test:** {command to verify each commit}
- **Command:** `git bisect start {bad} {good}`
#### Step 3: Isolation
- **Remove:** {variable to eliminate}
- **Expected change:** {what should happen}
### Instrumentation Points
1. {file}:{line} — log {variable/state} to observe {what}
2. {file}:{line} — breakpoint to inspect {what}
### Minimal Reproduction
```{language}
# Minimal code that triggers the bug
{code}
Root cause: {What was wrong} Fix: {What was changed — file:line, diff summary} Prevention: {Test added, lint rule, type annotation, etc.} Lessons: {What generalizes beyond this bug}
## Configuring Scope
| Mode | Scope | Depth | When to Use |
|------|-------|-------|-------------|
| `quick` | Single error | H1 test + fix | Clear stacktrace, obvious cause |
| `standard` | Full investigation | 3 hypotheses + bisection plan | Default for non-obvious bugs |
| `deep` | Systemic analysis | 5+ hypotheses + instrumentation + reproduction | Intermittent bugs, no stacktrace, production issues |
## Calibration Rules
1. **Hypotheses before code changes.** Never start modifying code without at least one
explicit hypothesis. "Let me try this" is not debugging — it's guessing.
2. **One variable at a time.** Each investigation step should change exactly one thing.
If you change two things and the bug disappears, you don't know which fixed it.
3. **Document dead ends.** Failed hypotheses are valuable — they narrow the search space.
Record what was tested and what was learned.
4. **Simplest explanation first.** Test typos, wrong variable names, and missing imports
before considering race conditions, compiler bugs, or cosmic rays.
5. **Feedback loop before hypotheses.** If you cannot reproduce the bug with a controlled
pass/fail signal, any fix is speculative. Invest in the loop first.
6. **Root cause, not symptoms.** A fix that addresses the symptom (adding a null check)
without understanding the root cause (why was it null?) leaves the real bug alive.
## Error Handling
| Problem | Resolution |
|---------|------------|
| No stacktrace available | Focus on log analysis and state inspection. Use instrumentation to generate diagnostic output. |
| Bug is intermittent | Add persistent logging at key decision points. Run under stress (high load, concurrent requests) to increase reproduction rate. |
| Cannot reproduce locally | Compare environments systematically: versions, config, data, timing. Use `docker` or VM to mirror production. |
| Multiple hypotheses equally likely | Design a single test that distinguishes between them. Binary decision: "If X, then H1; if Y, then H2." |
| Fix attempted but bug persists | The hypothesis was wrong. Revert the fix, update hypothesis rankings, and proceed to the next hypothesis. Do not stack fixes. |
| Bug is in a dependency | Confirm with a minimal reproduction that uses only the dependency. Check issue trackers. Pin to last known-good version while awaiting upstream fix. |
## When NOT to Investigate
Push back if:
- The error message already contains the fix ("missing module X" → install X)
- The issue is a known environment setup problem (wrong Python version, missing env var)
- The "bug" is actually a feature request or design disagreement — redirect to ADR or discussion
- The code is not under the user's control (third-party SaaS, managed service) — file a support ticket instead
- The user wants to debug generated/minified code — debug the source, not the output
testing
Create, review, and restyle data visualizations using Edward Tufte principles: high data-ink ratio, direct labels, range-frame axes, small multiples, accessible color, responsive charts, and honest comparisons. Triggers on: "create a chart", "style this chart", "review this graph", "Tufte chart", "data visualization", "Recharts", "Plotly", "matplotlib", "Chart.js", "ECharts", "D3". Use when generating or critiquing charts, dashboards, sparklines, and data tables.
testing
Manages dependent branch stacks and stacked pull requests using safe Git topology rules. Triggers on: "create stacked PRs", "publish this stack", "sync my PR stack", "rebase this stack", "merge the stack", "retarget child PRs", "split this branch into stacked PRs", "validate this stack", "cleanup stacked branches". Use when local branches or one source branch need to become a dependency-ordered PR stack with correct parent bases, validation, synchronization, merge order, and cleanup.
development
Scaffolds per-repository agent context so coding agents share the same issue tracker rules, triage label vocabulary, domain glossary, ADR layout, and handoff conventions. Triggers on: "set up project context", "configure agent docs", "create CONTEXT.md", "setup agent workflow", "agent issue tracker setup", "triage labels", "domain glossary for agents". Use when a repo needs durable context files before planning, triage, debugging, TDD, architecture review, or multi-agent implementation.
testing
Produces phased task boards from feature requests: dependency-mapped work items, parallelization flags, risk flags, edge cases, test matrices. Triggers on: "decompose this feature", "task breakdown with dependencies", "phased implementation plan", "work breakdown structure". NOT for effort estimates, use estimate-calibrator.