skills/forge-insights/SKILL.md
[read-only] Analyze trends across pipeline runs -- quality trajectory, agent effectiveness, cost analysis, convergence patterns, memory health. Use when you want to understand how pipeline quality has evolved, identify cost optimization opportunities, or review agent and memory effectiveness across runs.
npx skillsauth add quantumbitcz/dev-pipeline forge-insightsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Analyze trends across pipeline runs to surface actionable insights about quality, cost, agent behavior, and pipeline health.
See shared/skill-contract.md for the standard exit-code table.
Before any action, verify:
git rev-parse --show-toplevel 2>/dev/null. If fails: report "Not a git repository. Navigate to a project directory." and STOP..claude/forge.local.md exists. If not: report "Forge not initialized. Run /forge-init first." and STOP..forge/reports/ with report files.forge/state.json with telemetry data.claude/forge-log.md with run entries
If none exist: report "No pipeline run data found. Run /forge-run to generate data, then try again." and STOP.Read all available data sources:
.forge/reports/*.json or .forge/reports/*.md): per-run summaries including scores, findings, timings, agent dispatches..forge/state.json → telemetry): current/last run metrics — token usage, wall time, stage durations, agent dispatch counts..claude/forge-log.md): human-readable run history with dates, requirements, scores, verdicts, and retrospective notes.shared/learnings/ and .forge/learnings/): accumulated patterns, PREEMPT items, agent effectiveness records..forge/state.json → score_history): per-iteration score progression within the current/last run.If a source is unavailable, skip it and note which categories will have incomplete data.
Analyze score trends across runs:
### Quality Trajectory
| Run | Date | Score | Verdict | CRITICALs | WARNINGs |
|-----|------|-------|---------|-----------|----------|
| {n} | {date} | {score} | {verdict} | {count} | {count} |
**Trend:** {improving/declining/stable} ({delta} over {n} runs)
**Recurring Findings (3+ runs):**
| Category | Occurrences | Last Seen | Suggestion |
|----------|-------------|-----------|------------|
| {cat} | {n} | {date} | {codify as convention / investigate root cause} |
Analyze which agents contribute most to quality improvement:
### Agent Effectiveness
| Agent | Dispatches | Avg Findings | Score Impact | FP Rate |
|-------|-----------|-------------|-------------|---------|
| {agent} | {n} | {avg} | {delta} | {pct}% |
**Most impactful:** {agent} — avg {delta} point improvement per dispatch
**Least triggered:** {agent} — {n} findings across {m} runs
**Mutation kill rate:** {pct}% (trend: {direction})
Analyze resource consumption and cost efficiency:
.forge/trust.json model_efficiency. Stages without score impact are reported separately as overhead.Sources: state.json.cost, state.json.tokens, .forge/trust.json model_efficiency, state.json.cost_alerting.
Recommendation generation:
### Cost Analysis
#### Per-Run Cost Trend
| Run | Date | Tokens | Est. Cost | Score | Cost/Point | Budget Used |
|-----|------|--------|-----------|-------|------------|-------------|
#### Per-Stage Cost Breakdown
| Stage | Avg Tokens | Avg Cost | % of Total | Trend |
|-------|-----------|----------|-----------|-------|
#### Cost-Per-Quality-Point (Efficiency)
| Stage | Tier | Tokens/Point | Runs | Suggestion |
|-------|------|-------------|------|------------|
#### Model Tier Distribution
| Tier | Dispatches | Tokens | % of Total | Avg Cost |
|------|-----------|--------|-----------|----------|
#### Budget Utilization
| Run | Ceiling | Used | % | Alerts Triggered |
|-----|---------|------|---|-----------------|
#### Top-3 Cost Recommendations
| # | Recommendation | Expected Savings | Confidence |
|---|---------------|-----------------|------------|
Analyze how efficiently the pipeline converges to shipping quality:
### Convergence Patterns
| Metric | Value |
|--------|-------|
| Avg iterations to ship | {n} |
| First-pass success rate | {pct}% |
| Safety gate failure rate | {pct}% |
| Most common plateau cause | {cause} |
**Iteration Distribution:**
| Iterations | Runs | % |
|-----------|------|---|
| 1-2 | {n} | {pct}% |
| 3-5 | {n} | {pct}% |
| 6+ | {n} | {pct}% |
Analyze the accumulated knowledge base:
### Memory Health
**PREEMPT Items:**
| Priority | Active | Applied (last 5 runs) | Decay Candidates |
|----------|--------|-----------------------|------------------|
| HIGH | {n} | {n} | {n} |
| MEDIUM | {n} | {n} | {n} |
| LOW | {n} | {n} | {n} |
| ARCHIVED | {n} | — | — |
**Pattern Discovery:**
- Total auto-discovered patterns: {n}
- Applied in subsequent runs: {n} ({pct}%)
- Never applied: {n} (review for removal)
**Learnings Growth:**
- Total learnings files: {n}
- New entries (last 5 runs): {n}
- Most active category: {category}
Analyze token savings and compression compliance:
state.json.tokens.output_tokens_per_agent against the expected range for their stage's compression level. Estimate tokens saved relative to verbose baseline using the stage-level token ranges from shared/output-compression.md: verbose 800-2000, standard 800-2000, terse 400-1200, minimal 100-600.state.json.tokens.compression_level_distribution. Highlight if distribution is skewed (e.g., 90% verbose suggests misconfiguration or output_compression.enabled: false).terse (expected 400-1200 tokens) producing 1800 tokens is drifting./forge-compress has been run (detect via *.original.md backup files in agents/), compute before/after line counts and estimated token savings using wc -l..forge/caveman-mode), which level, and how many sessions used it (from .forge/events.jsonl if available).### Compression Effectiveness
**Output Compression:**
| Metric | Value |
|--------|-------|
| Dispatches at verbose | {n} |
| Dispatches at standard | {n} |
| Dispatches at terse | {n} |
| Dispatches at minimal | {n} |
| Estimated output tokens saved | {n} ({pct}% vs all-verbose baseline) |
**Drift Alerts:**
| Agent | Stage Level | Expected Range | Actual Tokens | Status |
|-------|------------|----------------|---------------|--------|
| {agent} | terse | 400-1200 | {n} | DRIFT / OK |
**Input Compression:**
| Scope | Files | Before (lines) | After (lines) | Reduction |
|-------|-------|-----------------|---------------|-----------|
| agents/ | {n} | {n} | {n} | {pct}% |
**Caveman Mode:** {off/lite/full/ultra}
Synthesize the six categories into actionable recommendations:
## Pipeline Insights Report
**Project:** {project name}
**Runs analyzed:** {count}
**Date range:** {earliest} to {latest}
{Category 1-6 sections as above}
### Recommendations
| Priority | Action | Category | Expected Impact |
|----------|--------|----------|-----------------|
| {1-N} | {specific action} | {category} | {what improves} |
Prioritize recommendations by expected impact:
Write the full report to .forge/reports/insights-{date}.md where {date} is today in YYYY-MM-DD format. If the reports directory does not exist, create it.
.forge/reports/.| Condition | Action |
|-----------|--------|
| Prerequisites fail | Report specific error message and STOP |
| No run data available | Report "No pipeline run data found. Run /forge-run to generate data, then try again." and STOP |
| Only one data source available | Generate partial report and note which categories have insufficient data |
| Fewer than 3 runs | Note that trend analysis requires more data points. Focus on single-run metrics |
| Report directory does not exist | Create .forge/reports/ before writing the report |
| Data source unparseable | Skip the malformed source, log WARNING, continue with remaining sources |
| State corruption | This skill reads state.json for telemetry but does not depend on valid pipeline state |
/forge-history -- View run history with scores and verdicts (simpler than insights)/forge-profile -- Detailed performance profiling of a single pipeline run/forge-status -- Check current pipeline run state/forge-recover diagnose -- Diagnose pipeline health issues for the current rundevelopment
[writes] Build, fix, deploy, review, or modify code in this project. Universal entry for the forge pipeline. Auto-bootstraps on first run; brainstorms before planning when given a feature description. Use when you want to take any productive action: implementing features, fixing bugs, reviewing branches, deploying, committing, running migrations.
tools
[writes] Manage forge state and configuration: recovery, abort, config edits, session handoff, automations, playbooks, output compression, knowledge graph maintenance. Use when you need to recover from broken pipeline state, edit settings, or manage long-lived state.
development
[writes] Create, list, show, resume, or search forge session handoffs. Use when context is getting heavy and you want to transfer a forge run or conversation into a fresh Claude Code session, or to resume from a prior handoff artefact. Subcommands - no args (write), list, show, resume, search.
development
[writes] Manage the Neo4j knowledge graph. Subcommands: init, rebuild (writes); status, query <cypher>, debug (read-only). Requires Docker. No default — an explicit subcommand is required. Use when setting up the graph for the first time, rebuilding after major refactors, checking graph health, or running ad-hoc Cypher diagnostics.