plugins/utils/skills/metrics-tokens/SKILL.md
Analyze token usage efficiency against the MetaGPT baseline and surface per-step optimization opportunities
npx skillsauth add jmagly/aiwg metrics-tokensInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You perform deep analysis of token usage efficiency. You compare AIWG workflow token consumption against the MetaGPT 124 tokens/line benchmark (REF-013), identify high-cost operations, and surface optimization opportunities.
Alternate expressions and non-obvious activations (primary phrases are matched automatically from the skill description):
| Pattern | Example | Action |
|---------|---------|--------|
| Efficiency report | "token efficiency" | aiwg metrics-tokens |
| Session analysis | "analyze tokens for this session" | aiwg metrics-tokens --session current |
| Threshold check | "are we at green" | aiwg metrics-tokens --threshold |
| Per-step breakdown | "which step used the most tokens" | aiwg metrics-tokens --by-step |
| Optimization hints | "suggest token optimizations" | aiwg metrics-tokens --optimize |
When triggered:
Determine scope:
--session <name>: named session--all: aggregate across all sessionsLoad token data:
.aiwg/ralph/sessions/*/metrics.json for raw token countssrc/metrics/token-counter.ts)Compute efficiency metrics:
vsBenchmark: percentage vs MetaGPT 124 tokens/line (negative = better)vsBaseline: percentage vs typical LLM 200 tokens/line (negative = better)Run the command:
# Default efficiency report
aiwg metrics-tokens
# Current session
aiwg metrics-tokens --session current
# Per-step breakdown
aiwg metrics-tokens --by-step
# With optimization suggestions
aiwg metrics-tokens --optimize
# JSON output
aiwg metrics-tokens --json
The MetaGPT 124 tokens/line benchmark comes from REF-013 (research corpus). It represents a validated efficiency target for AI-assisted software workflows. AIWG tracks against this benchmark to make token costs legible and comparable across sessions.
| Threshold | Tokens/Line | Status | Action | |-----------|-------------|--------|--------| | At or below benchmark | ≤ 124 | green | No action needed | | Above benchmark | 125–150 | yellow | Flag for review | | Well above benchmark | > 150 | red | Generate optimization recommendations |
Comparison points:
| Baseline | Tokens/Line | |----------|-------------| | MetaGPT benchmark (REF-013) | 124 | | Typical LLM baseline | ~200 | | AIWG target | ≤ 124 |
Token Efficiency — Session: sdlc-review-20260401-143022
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Token Counts
Input: 42,310 tokens
Output: 18,940 tokens
Total: 61,250 tokens
Content Metrics
Characters: 245,000
Non-blank lines: 548
Total lines: 621
Efficiency
Tokens/line: 112
vs MetaGPT: -9.7% (better than 124 tokens/line benchmark)
vs LLM baseline: -44% (well below 200 tokens/line typical)
Status: green
Threshold: green — at or below MetaGPT benchmark
--by-step)Token Efficiency by Step
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Step Tokens Lines Tokens/Line Status
────────────────────── ──────── ───── ─────────── ──────
architecture-designer 18,200 168 108 green
security-architect 14,600 132 111 green
test-architect 13,100 119 110 green
technical-writer 15,350 129 119 green ← highest volume
──────────────────────────────────
Total 61,250 548 112 green
--optimize)Optimization Suggestions
━━━━━━━━━━━━━━━━━━━━━━━━
Status: green — no critical optimizations needed.
Opportunities (optional):
1. technical-writer (119 tok/line) — near benchmark ceiling.
Consider: scope the synthesis prompt to final merge only,
avoid re-reading full drafts.
2. architecture-designer (18,200 tokens) — highest absolute cost.
Consider: pass only the relevant SAD section, not the full doc.
Token efficiency uses the estimation and comparison logic from src/metrics/token-counter.ts:
tokens = ceil(characters / 4)
tokensPerLine = tokens / nonBlankLines
vsBenchmark = (tokensPerLine - 124) / 124 * 100 (negative = better)
vsBaseline = (tokensPerLine - 200) / 200 * 100 (negative = better)
User: "Token efficiency for this session"
Action:
aiwg metrics-tokens
Response: Efficiency report with tokens/line ratio, benchmark comparison, and green/yellow/red status.
User: "Which step used the most tokens?"
Action:
aiwg metrics-tokens --by-step
Response: Per-step table showing token counts, line counts, tokens/line ratio, and threshold status for each workflow step.
User: "Suggest ways to reduce token usage"
Action:
aiwg metrics-tokens --optimize
Response: Optimization suggestions targeted at steps above the green threshold, with specific prompt-scoping recommendations.
User: "Are we at green on token efficiency?"
Extraction: Threshold check
Action:
aiwg metrics-tokens --threshold
Response: "Threshold status: green — 112 tokens/line, 9.7% below the MetaGPT 124 tokens/line benchmark (REF-013)."
If the session scope is unclear:
data-ai
Report which research-corpus radar sidecars are overdue for refresh. Computes staleness (days since last refresh vs the cadence window) for every radar, sorted most-overdue-first. Runs via `aiwg corpus radar-status`.
data-ai
Aggregate research-corpus radar sidecars into a corpus or per-cluster freshness report — totals, overdue count, per-cluster / per-GRADE / per-trajectory breakdowns, an overdue table, and per-radar rationale snippets. Runs via `aiwg corpus radar-report`.
testing
Scaffold radar/freshness sidecars for research-corpus REFs. Pulls title/authors from the citation sidecar and GRADE from the analysis doc, defaults the refresh cadence from GRADE and the cluster from a corpus-local map, and stamps documentation/radar/REF-XXX-radar.md. Runs via `aiwg corpus radar-init`.
data-ai
Compute an entity's publication trajectory — per-year paper counts, topic drift, hot-streak detection (≥3 consecutive A-grade years), and career phase. Runs via `aiwg corpus profile-temporal`.