skills/troubleshoot/SKILL.md
--- name: troubleshoot category: implementation model: claude-opus-4-7 userInvocable: true description: Systematically troubleshoot a failing feature or error. Discovers code, investigates root cause, applies fix, verifies with tests, and commits. Use when something isn't working as expected. Runs in the current working tree by default — set `worktree.enabled: true` in `.claude/configuration.yml` to isolate work in a git worktree. argument-hint: <error-or-description> allowed-tools: Read, Write,
npx skillsauth add nexus-a1/claude-skills skills/troubleshootInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Arguments: $ARGUMENTS
Systematically troubleshoot issues through multi-agent orchestration: discover → investigate → fix → verify → commit.
# Source resolve-config: marketplace installs get ${CLAUDE_PLUGIN_ROOT} substituted
# inline before bash runs; ./install.sh users fall back to ~/.claude. If neither
# path resolves, fail loudly rather than letting resolve_artifact be undefined.
if [ -f "${CLAUDE_PLUGIN_ROOT}/shared/resolve-config.sh" ]; then
source "${CLAUDE_PLUGIN_ROOT}/shared/resolve-config.sh"
elif [ -f "$HOME/.claude/shared/resolve-config.sh" ]; then
source "$HOME/.claude/shared/resolve-config.sh"
else
echo "ERROR: resolve-config.sh not found. Install via marketplace or run ./install.sh" >&2
exit 1
fi
TROUBLESHOOT_EXEC_MODE=$(resolve_exec_mode troubleshoot team)
# Optional --spec PATH: opt-in source for per-AC verification (Phase 6.3).
# Troubleshoot is ticket-agnostic by default; this flag is the ONLY way a spec
# enters the run — no inference from CWD, ticket, or work dir.
# Quote-aware parse: handles `--spec PATH`, `--spec=PATH`, and quoted paths
# with spaces; a bare `--spec` (no value) resolves to empty and is ignored.
SPEC=""
case " $ARGUMENTS " in
*" --spec "*|*" --spec="*)
rest=${ARGUMENTS#*--spec}; rest=${rest#[ =]} # text after the flag, minus one space/=
case $rest in
\"*) SPEC=${rest#\"}; SPEC=${SPEC%%\"*} ;; # double-quoted path (allows spaces)
\'*) SPEC=${rest#\'}; SPEC=${SPEC%%\'*} ;; # single-quoted path
*) SPEC=${rest%% *} ;; # bare token, up to next space
esac
;;
esac
if [ -n "$SPEC" ] && [ ! -f "$SPEC" ]; then
echo "WARNING: --spec '$SPEC' not found; per-AC verification skipped." >&2
SPEC=""
fi
Use $TROUBLESHOOT_EXEC_MODE to determine team vs sub-agent behavior in Phase 6 (verify fix).
When running verification agents in parallel (Phase 6), agents MUST NOT write to the same file:
See ~/.claude/shared/write-safety.md for the full conventions.
/troubleshoot "Endpoint /api/users returns 202 instead of 200"
/troubleshoot "Login fails when password contains special characters"
/troubleshoot "Database query times out on large datasets"
/troubleshoot "Endpoint returns 500 after deploy" --spec .claude/work/PROJ-1-login/spec.md
--spec PATH (optional) opts into per-AC verification: when supplied, Phase 6.3 verifies the fix against the spec's acceptance criteria and appends a per-AC PASS/FAIL section. Omit it for an ordinary ad-hoc run — troubleshoot stays ticket-agnostic and infers no spec on its own.
┌──────────────────────────────────────────────────────────────────────┐
│ Phase 1: Parse Issue → Extract what's wrong │
│ Phase 2: Discover Code → Find relevant code │
│ Phase 3: Investigate → Root cause analysis │
│ Phase 4: Determine Fix Strategy → Code fix, test fix, or clarify │
│ Phase 5: Apply Fix → Apply fix (code or tests) │
│ Phase 6: Verify → Run tests, ensure fix works │
│ Phase 7: Commit → Save the fix │
└──────────────────────────────────────────────────────────────────────┘
Skip if resolve_worktree_enabled returns "false".
Single mode (WORKSPACE_MODE == "single"):
EnterWorktree(name: "troubleshoot-{short_slug}") where {short_slug} is derived from the issue description (e.g., troubleshoot-login-500)$WORK_DIR still resolves to original workspace rootMulti mode (WORKSPACE_MODE == "multi"):
WT_ROOT=$(resolve_worktree_root)
TROUBLESHOOT_WORKSPACE="${WT_ROOT}/troubleshoot-{short_slug}"
mkdir -p "$TROUBLESHOOT_WORKSPACE"
for svc in $(resolve_services); do
svc_path=$(resolve_service_path "$svc")
wt_path="${TROUBLESHOOT_WORKSPACE}/${svc}"
[[ -d "$wt_path" ]] && continue
CURRENT_BRANCH=$(git -C "$svc_path" branch --show-current 2>/dev/null || echo "HEAD")
git -C "$svc_path" worktree add "$wt_path" -b "troubleshoot/{short_slug}" 2>/dev/null \
|| git -C "$svc_path" worktree add "$wt_path" "$CURRENT_BRANCH"
echo "Created worktree: ${svc}/ → ${wt_path}"
done
$TROUBLESHOOT_WORKSPACE/{service}/ pathsAfter Phase 7 (Commit): Single mode → ExitWorktree(action: "remove"). Multi mode → remove worktrees:
for svc in $(resolve_services); do
svc_path=$(resolve_service_path "$svc")
wt_path="${TROUBLESHOOT_WORKSPACE}/${svc}"
[[ -d "$wt_path" ]] && git -C "$svc_path" worktree remove "$wt_path" --force 2>/dev/null
done
rmdir "$TROUBLESHOOT_WORKSPACE" 2>/dev/null
Goal: Understand what's wrong and what's expected.
Extract from user description:
Example:
Input: "Endpoint /api/users returns 202 instead of 200"
Parsed:
Component: /api/users endpoint
Expected: HTTP 200 status
Actual: HTTP 202 status
Type: Response status mismatch
Confirm understanding with user if unclear.
Goal: Find the relevant code and understand the flow.
Use Explore agent to locate:
Agent delegation: Pass purpose, not just a query — state the symptom and that the result feeds root-cause investigation, so the agent scopes its trace accordingly. If it returns no concrete anchors (file:line, symbols), re-dispatch with a refined query (≤3 cycles). See ~/.claude/shared/subagent-context-discipline.md.
Task(Explore, "Troubleshooting: /api/users returns 202 instead of 200. Find the endpoint definition and trace the code flow through controllers and services so we can locate where the status is set. Return file:line anchors.")
Explore agent returns:
If code is complex or legacy: (same dispatch discipline — carry the symptom and the investigation goal, not just the endpoint name)
Task(archaeologist, "Troubleshooting why /api/users returns 202: deep-dive the endpoint and its call chain to find what sets the status. Return file:line anchors and any historical clues (TODOs, workarounds).")
Output to user:
## Code Discovery
Found endpoint: src/Controller/UserController.php:45
Route: GET /api/users → UserController::index()
Service: UserService::getUsers()
Tests: tests/Feature/UserApiTest.php
Call chain:
Route → Controller → Service → Repository → Database
Goal: Understand WHY the issue occurs.
Investigation steps:
# When did this start?
git log -p --all -S "202" -- path/to/controller
# Recent changes to this file
git log --oneline -10 -- path/to/controller
# What do tests expect?
grep -r "api/users" tests/ -A 5 -B 5
Investigate the root cause:
Output to user:
## Root Cause Analysis
Issue: UserController returns HTTP 202 (Accepted) for async processing
Location: UserController.php:45
Introduced: commit abc123f (3 days ago)
Root Cause:
Code was changed to use async job processing, which returns 202
to indicate request was accepted but not yet processed.
Decision needed:
☐ Keep 202 (correct for async) and update tests
☐ Revert to 200 (synchronous processing)
Goal: Decide what needs to be fixed.
Three scenarios:
Action: Fix the code
Action: Update tests
Action: Ask user for clarification
Use AskUserQuestion if decision needed:
AskUserQuestion:
Question: "Should /api/users be synchronous (200) or asynchronous (202)?"
Options:
1. Synchronous (200) - Users wait for result
2. Asynchronous (202) - Background processing
Goal: Implement the solution.
If code needs fixing:
Example:
// Before
return new JsonResponse($data, 202); // Async processing
// After
return new JsonResponse($data, 200); // Synchronous response
If tests need updating:
Example:
// Before
$response->assertStatus(200);
// After
$response->assertStatus(202); // Updated for async processing
If tests are missing:
Task(test-writer, "Write test for /api/users endpoint expecting 200 status code for successful response")
Goal: Ensure the fix works and doesn't break anything.
Execution mode: Determined by $TROUBLESHOOT_EXEC_MODE.
# Run specific test file
./vendor/bin/phpunit tests/Feature/UserApiTest.php
# Or run all tests
./vendor/bin/phpunit
Delegate to test-fixer:
Task(test-fixer, "Fix failing test after changing /api/users to return 200 instead of 202")
If $TROUBLESHOOT_EXEC_MODE = "subagent":
Run verification agents in parallel:
[PARALLEL EXECUTION - Single message with multiple Task calls]
Task 1: subagent_type: "security-auditor"
Prompt: Quick security audit of {endpoint/component} after {change description}.
Check for: injection risks, auth bypass, data exposure from the fix.
Task 2: subagent_type: "quality-guard"
Prompt: Verify the troubleshoot fix (Level 2 — Implementation Validation).
Fix diff: {git_diff}
Root cause: {root_cause_analysis}
Verify:
1. Does the fix actually address the root cause, or just the symptom?
2. Are there other code paths with the same bug pattern?
3. Do the tests cover the specific condition that triggered the bug?
When `--spec` was supplied (`$SPEC` resolved to a file): also read `$SPEC`, verify the fix against each acceptance criterion, and prefix every gate that maps to an AC with its AC ID(s) — e.g., `GATE 2: AC-3.1 — ...` — citing grader-typed evidence per `~/.claude/shared/eval-concepts.md`.
Produce a Quality Review Gates report.
If skeptic raises BLOCKING gates, address them before committing.
Deadlock protocol: If the skeptic rejects the fix 3 times, STOP iterating. Escalate to the user with: (a) the fix diff, (b) the skeptic's objections across all rounds, (c) your attempts to address them. The user decides: override, provide guidance, or abort.
If $TROUBLESHOOT_EXEC_MODE = "team" (default):
TeamCreate(team_name="troubleshoot-verify")
TaskCreate: "Security audit of fix" (T1)
TaskCreate: "Challenge the fix" (T2) — depends on T1
[PARALLEL]
Task tool: name: "troubleshoot-security", subagent_type: "security-auditor", team_name: "troubleshoot-verify"
Task tool: name: "troubleshoot-skeptic", subagent_type: "quality-guard", team_name: "troubleshoot-verify"
Skeptic waits for security-auditor, then challenges. Agents resolve via SendMessage. Collect results and TeamDelete.
Deadlock protocol: Max 3 rejection cycles. After 3 rejections from the skeptic, stop iterating and escalate to the user with all objections and attempted fixes. The user decides: override, provide guidance, or abort.
Output to user:
## Verification
✓ Tests passing: 15/15
✓ Security audit: {No issues | Issues found}
✓ Skeptic validation: {APPROVED | CONDITIONAL}
✓ Manual verification: Endpoint returns 200
Per-AC Verification (--spec only — one row per AC):
| AC ID | Verdict | Grader | Evidence |
|--------|---------|--------|-------------------------------|
| AC-3.1 | PASS | code | UserApiTest::testStatus → 200 |
Fix verified successfully.
Per-AC section (only when --spec PATH was supplied and $SPEC resolved to a file): assemble one row per AC from the quality-guard gate output (AC-tagged) against the spec's AC list — same rules as /implement Phase 4.5; source is the quality-guard output and evidence follows the grader type. Re-verification reliability matters here: a fix that passes once is pass@1, not pass^k — flag a re-verified flaky fix as such (see ~/.claude/shared/eval-concepts.md). When no --spec is supplied (the default ad-hoc run), omit this section entirely — no error, no placeholder.
Goal: Save the fix with proper documentation.
Run inline — the hook enforces credential scan and branch protection automatically:
git add <files>
git commit -m "$(cat <<'EOF'
[TICKET-123] fix(scope): description
EOF
)"
# If pushing: record security-auditor confirmation first (after a clean scan)
bash "${CLAUDE_PLUGIN_ROOT}/hooks/record-audit.sh"
git push
Commit message format:
[TICKET-123] fix(api): change /api/users to return 200 instead of 202
- Changed UserController to use synchronous processing
- Updated tests to expect 200 status code
- Root cause: Async processing was unintended change in commit abc123f
Output to user:
## Debug Complete ✓
Issue: /api/users returns 202 instead of 200
Root Cause: Unintended async processing change
Fix Applied: Reverted to synchronous response
Tests: All passing
Commit: abc123f
Status: RESOLVED
❌ Could not locate /api/users endpoint
Suggestions:
• Check if route exists: grep -r "api/users" routes/
• Check if endpoint was removed
• Try broader search: grep -r "users" src/Controller/
⚠️ Root cause not definitively identified
Next steps:
1. Add debug logging around suspected code
2. Check production logs for error patterns
3. Reproduce issue locally with debugging enabled
4. Consider pairing with developer familiar with this code
⚠️ Fix broke 3 other tests
Rolling back change...
Delegating to test-fixer for comprehensive test fix...
Task(test-fixer, "Fix all failing tests after changing /api/users status code")
| Phase | Agent(s) Used | Purpose | |-------|---------------|---------| | Discovery | Explore, archaeologist | Find and understand code | | Investigation | Direct analysis | Root cause analysis | | Fix | Direct (Edit tool) | Apply code/test changes | | Verification | test-writer, test-fixer | Ensure fix works | | Review | security-auditor, quality-guard | Validate fix quality and security | | Commit | git-operator | Save and document fix |
Provide clear issue descriptions: ✅ "Login endpoint returns 500 when password is empty" ✅ "User creation fails with unique constraint error on email" ✅ "Dashboard loads slowly (>5s) with 1000+ items"
❌ "It's broken" ❌ "Fix the login" ❌ "Make it faster"
Include context when available:
Example:
/troubleshoot "Login endpoint returns 500 when password is empty
Error: Call to a member function hash() on null
Stack trace shows error in AuthService::validatePassword()
Expected: 400 Bad Request with validation error
Actual: 500 Internal Server Error"
Before completing troubleshoot session:
User:
/troubleshoot "Endpoint /api/users returns 202 instead of 200"
Skill Workflow:
Result: Issue resolved, tests passing, fix committed.
development
Add a new entry to the product knowledge base. Wizard-guided — prompts for category, title, and content, then writes a structured markdown file and rebuilds the manifest.
data-ai
Show all active work sessions across brainstorms, requirements, proposals, and epics. Supports --update to advance lifecycle on one session and --sync to sweep them all.
documentation
Review and update project documentation using an agent team. Inventories docs, identifies gaps and drift, updates technical and API docs in parallel.
tools
Annotate an active work session with a note, scope change, or new finding. Auto-detects the active session, synthesizes the salient points of the current conversation, and appends a timestamped entry to state.json after a single target confirmation. Use mid-session when you learn something that should be preserved.