skills/forge-diagnose/SKILL.md
Diagnose why a pipeline run is stuck, broken, or behaving unexpectedly. Read-only health check of state.json integrity, recovery budget, convergence status, and stalled stages. Use when the pipeline seems stuck, a run failed with unclear errors, or before resuming. Trigger: /forge-diagnose, pipeline stuck, why did the run fail, what went wrong
npx skillsauth add quantumbitcz/dev-pipeline forge-diagnoseInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are a read-only diagnostic tool. Your job is to inspect the current pipeline state and report problems without changing anything. You never modify files, dispatch agents, or trigger recovery.
Before any action, verify:
git rev-parse --show-toplevel 2>/dev/null. If fails: report "Not a git repository. Navigate to a project directory." and STOP..forge/state.json exists. If not: report "No pipeline state found. Nothing to diagnose. Run /forge-run to start a pipeline." and STOP..forge/state.json exists.
/forge-run to start a pipeline." and stop..forge/state.json and parse the full JSON..claude/forge.local.md (if it exists) for config reference..claude/forge-config.md (if it exists) for runtime parameter reference.Run these checks against state.json and report each as PASS or PROBLEM:
Schema version:
version matches "1.5.0" (current schema version from shared/state-schema.md).Required fields:
story_id, requirement, story_state, mode, complete.{field} is missing or null."Story state validity:
story_state is one of: PREFLIGHT, EXPLORING, PLANNING, VALIDATING, IMPLEMENTING, VERIFYING, REVIEWING, DOCUMENTING, SHIPPING, LEARNING, COMPLETE, ABORTED, ESCALATED, DECOMPOSED, MIGRATING, MIGRATION_PAUSED, MIGRATION_CLEANUP, MIGRATION_VERIFY.{value}. Not a recognized pipeline state."Mode validity:
mode is one of: standard, bugfix, migration, bootstrap, testing, refactor, performance.{value}. Not a recognized pipeline mode."Sequence counter:
_seq is a positive integer._seq counter is invalid ({value}). State writes may not be functioning."Completion consistency:
complete: true but story_state is not COMPLETE and abort_reason is empty: "State marked complete but story_state is {story_state} with no abort reason."complete: false and story_state is COMPLETE: "Story state is COMPLETE but complete flag is false."Check iteration counters against configured maximums (from forge-config.md or defaults):
Total retries:
total_retries and total_retries_max (default 10).total_retries > total_retries_max: "Total retries ({total_retries}) exceeds maximum ({total_retries_max}). Pipeline should have escalated."total_retries >= total_retries_max * 0.8: "Total retries at {pct}% of budget."Recovery budget:
recovery_budget.total_weight and recovery_budget.max_weight (default 5.5).total_weight > max_weight: "Recovery budget ({total_weight}) exceeds ceiling ({max_weight})."total_weight >= max_weight * 0.8: "Recovery budget at {pct}% of ceiling."Convergence counters:
convergence.total_iterations and compare against max_iterations from forge-config.md (default 8).total_iterations > max_iterations: "Total iterations ({total_iterations}) exceeds max ({max_iterations}). Pipeline should have stopped."convergence.safety_gate_failures. PROBLEM if >= 2 and convergence.safety_gate_passed is false: "Safety gate failed {n} times without passing. Cross-phase oscillation likely."convergence.plateau_count and plateau_patience (default 2). WARNING if plateau_count >= plateau_patience: "Plateau patience exhausted ({plateau_count} >= {plateau_patience})."Detect whether the pipeline appears stuck:
stage_timestamps from state.json.complete is false:
.forge/.lock exists:
kill -0 $pid 2>/dev/nullIf score_history has 2+ entries:
oscillation_tolerance (default 5), WARNING: "Overall score regression from {first} to {last}."plateau_threshold (default 2), INFO: "Score plateaued around {avg}."Report which integrations are available vs configured:
integrations from state.json.integrations.linear.available is true but linear.epic_id is empty: WARNING: "Linear available but no epic linked."integrations.neo4j.available is true but integrations.neo4j.node_count is 0: WARNING: "Neo4j available but graph is empty."Check .forge/evidence.json if it exists:
evidence.timestamp and evidence.verdict.verdict is "SHIP": report "Evidence: SHIP (generated {timestamp})"verdict is "BLOCK": report "Evidence: BLOCKED — reasons: {block_reasons}"now - evidence.timestamp. If age > 30 minutes (default evidence_max_age_minutes): WARNING: "Evidence is stale ({age} minutes old). Pipeline will re-verify before shipping."story_state is SHIPPING: WARNING: "In SHIPPING state but no evidence.json found."Present results in this format:
## Pipeline Diagnostic Report
**State file:** .forge/state.json
**Story:** {story_id}
**Stage:** {story_state}
**Mode:** {mode}
**Complete:** {complete}
### Integrity Checks
- {check_name}: PASS / PROBLEM: {detail}
- ...
### Counter Health
- Total retries: {n}/{max} {status}
- Recovery budget: {weight}/{max_weight} {status}
- Convergence iterations: {n}/{max} {status}
- Plateau count: {n}/{patience}
- Safety gate failures: {n}
### Stalled Stage Detection
- Last activity: {timestamp} ({minutes_ago} minutes ago)
- Lock file: {status}
### Score Trend
- History: {scores}
- Trend: {assessment}
### Integrations
| Integration | Status |
|-------------|--------|
| Linear | {status} |
| Playwright | {status} |
| Slack | {status} |
| Figma | {status} |
| Context7 | {status} |
| Neo4j | {status} |
### Summary
- {total_problems} problems, {total_warnings} warnings, {total_info} informational
- Recommendation: {recommendation}
Recommendations:
/forge-reset to clear stale state, or /forge-repair-state to fix specific issues."/forge-repair-state to attempt automatic correction."/forge-reset to start fresh (preserves learnings)."/forge-reset.| Condition | Action |
|-----------|--------|
| state.json missing | Report "No pipeline state found. Nothing to diagnose. Run /forge-run to start a pipeline." and STOP |
| state.json unparseable JSON | Report "state.json is not valid JSON. Run /forge-reset to start fresh." and STOP |
| forge-config.md missing | Use default values for all parameter checks. Note in report that config file is absent |
| Lock file PID check fails | Report lock file status as "unknown" and note the check could not be performed |
/forge-repair-state -- Fix specific state.json issues found by this diagnostic/forge-resume -- Resume an aborted pipeline after confirming health with diagnose/forge-reset -- Clear all state when diagnose shows corruption beyond repair/forge-status -- Quick overview of pipeline state (less detailed than diagnose)/forge-profile -- Performance analysis of pipeline execution (complementary to health diagnostic)development
[writes] Build, fix, deploy, review, or modify code in this project. Universal entry for the forge pipeline. Auto-bootstraps on first run; brainstorms before planning when given a feature description. Use when you want to take any productive action: implementing features, fixing bugs, reviewing branches, deploying, committing, running migrations.
tools
[writes] Manage forge state and configuration: recovery, abort, config edits, session handoff, automations, playbooks, output compression, knowledge graph maintenance. Use when you need to recover from broken pipeline state, edit settings, or manage long-lived state.
development
[writes] Create, list, show, resume, or search forge session handoffs. Use when context is getting heavy and you want to transfer a forge run or conversation into a fresh Claude Code session, or to resume from a prior handoff artefact. Subcommands - no args (write), list, show, resume, search.
development
[writes] Manage the Neo4j knowledge graph. Subcommands: init, rebuild (writes); status, query <cypher>, debug (read-only). Requires Docker. No default — an explicit subcommand is required. Use when setting up the graph for the first time, rebuilding after major refactors, checking graph health, or running ad-hoc Cypher diagnostics.