Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

jmagly/metrics-tokens

Name: metrics-tokens
Author: jmagly

plugins/utils/skills/metrics-tokens/SKILL.md

npx skillsauth add jmagly/aiwg metrics-tokens

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

metrics-tokens

You perform deep analysis of token usage efficiency. You compare AIWG workflow token consumption against the MetaGPT 124 tokens/line benchmark (REF-013), identify high-cost operations, and surface optimization opportunities.

Triggers

Alternate expressions and non-obvious activations (primary phrases are matched automatically from the skill description):

"how efficient are my tokens" → efficiency ratio vs MetaGPT baseline
"am I above the baseline" → threshold status check
"where are tokens being wasted" → per-step breakdown with recommendations
"token ratio" → tokens/line ratio calculation

Trigger Patterns Reference

| Pattern | Example | Action | |---------|---------|--------| | Efficiency report | "token efficiency" | aiwg metrics-tokens | | Session analysis | "analyze tokens for this session" | aiwg metrics-tokens --session current | | Threshold check | "are we at green" | aiwg metrics-tokens --threshold | | Per-step breakdown | "which step used the most tokens" | aiwg metrics-tokens --by-step | | Optimization hints | "suggest token optimizations" | aiwg metrics-tokens --optimize |

Behavior

When triggered:

Determine scope:
- Default: current or most recent session
- --session <name>: named session
- --all: aggregate across all sessions
Load token data:
- Read .aiwg/ralph/sessions/*/metrics.json for raw token counts
- Apply estimation heuristic: 4 chars per token (aligned with src/metrics/token-counter.ts)
Compute efficiency metrics:
- Tokens/line ratio for session output
- vsBenchmark: percentage vs MetaGPT 124 tokens/line (negative = better)
- vsBaseline: percentage vs typical LLM 200 tokens/line (negative = better)
- Threshold status: green (≤124), yellow (125–150), red (>150)

Run the command:

# Default efficiency report
aiwg metrics-tokens

# Current session
aiwg metrics-tokens --session current

# Per-step breakdown
aiwg metrics-tokens --by-step

# With optimization suggestions
aiwg metrics-tokens --optimize

# JSON output
aiwg metrics-tokens --json

Benchmark Reference

The MetaGPT 124 tokens/line benchmark comes from REF-013 (research corpus). It represents a validated efficiency target for AI-assisted software workflows. AIWG tracks against this benchmark to make token costs legible and comparable across sessions.

| Threshold | Tokens/Line | Status | Action | |-----------|-------------|--------|--------| | At or below benchmark | ≤ 124 | green | No action needed | | Above benchmark | 125–150 | yellow | Flag for review | | Well above benchmark | > 150 | red | Generate optimization recommendations |

Comparison points:

| Baseline | Tokens/Line | |----------|-------------| | MetaGPT benchmark (REF-013) | 124 | | Typical LLM baseline | ~200 | | AIWG target | ≤ 124 |

Report Format

Standard Efficiency Report

Token Efficiency — Session: sdlc-review-20260401-143022
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Token Counts
  Input:    42,310 tokens
  Output:   18,940 tokens
  Total:    61,250 tokens

Content Metrics
  Characters:     245,000
  Non-blank lines:    548
  Total lines:        621

Efficiency
  Tokens/line:    112
  vs MetaGPT:     -9.7%  (better than 124 tokens/line benchmark)
  vs LLM baseline: -44%  (well below 200 tokens/line typical)
  Status:         green

Threshold: green — at or below MetaGPT benchmark

Per-Step Breakdown (`--by-step`)

Token Efficiency by Step
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Step                    Tokens    Lines  Tokens/Line  Status
──────────────────────  ────────  ─────  ───────────  ──────
architecture-designer   18,200    168    108          green
security-architect      14,600    132    111          green
test-architect          13,100    119    110          green
technical-writer        15,350    129    119          green  ← highest volume
                        ──────────────────────────────────
Total                   61,250    548    112          green

Optimization Report (`--optimize`)

Optimization Suggestions
━━━━━━━━━━━━━━━━━━━━━━━━

Status: green — no critical optimizations needed.

Opportunities (optional):
  1. technical-writer (119 tok/line) — near benchmark ceiling.
     Consider: scope the synthesis prompt to final merge only,
     avoid re-reading full drafts.

  2. architecture-designer (18,200 tokens) — highest absolute cost.
     Consider: pass only the relevant SAD section, not the full doc.

Efficiency Calculation

Token efficiency uses the estimation and comparison logic from src/metrics/token-counter.ts:

tokens          = ceil(characters / 4)
tokensPerLine   = tokens / nonBlankLines
vsBenchmark     = (tokensPerLine - 124) / 124 * 100   (negative = better)
vsBaseline      = (tokensPerLine - 200) / 200 * 100   (negative = better)

Examples

Example 1: Quick efficiency check

User: "Token efficiency for this session"

Action:

aiwg metrics-tokens

Response: Efficiency report with tokens/line ratio, benchmark comparison, and green/yellow/red status.

Example 2: Identify expensive steps

User: "Which step used the most tokens?"

Action:

aiwg metrics-tokens --by-step

Response: Per-step table showing token counts, line counts, tokens/line ratio, and threshold status for each workflow step.

Example 3: Optimization pass

User: "Suggest ways to reduce token usage"

Action:

aiwg metrics-tokens --optimize

Response: Optimization suggestions targeted at steps above the green threshold, with specific prompt-scoping recommendations.

Example 4: Are we at green?

User: "Are we at green on token efficiency?"

Extraction: Threshold check

Action:

aiwg metrics-tokens --threshold

Response: "Threshold status: green — 112 tokens/line, 9.7% below the MetaGPT 124 tokens/line benchmark (REF-013)."

Clarification Prompts

If the session scope is unclear:

"Should I analyze the current running session or the most recent completed session?"

References

@$AIWG_ROOT/src/cli/handlers/subcommands.ts — Metrics tokens handler
@$AIWG_ROOT/src/metrics/token-counter.ts — Token counting, MetaGPT baseline constants (REF-013)
@$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/schemas/flows/token-efficiency.yaml — Token efficiency schema
@$AIWG_ROOT/docs/cli-reference.md — CLI reference

jmagly/metrics-tokens

plugins/utils/skills/metrics-tokens/SKILL.md

Analyze token usage efficiency against the MetaGPT baseline and surface per-step optimization opportunities

130 stars

testing

Updated May 14, 2026

$ install --global

skillsauth

npx skillsauth add jmagly/aiwg metrics-tokens

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 14, 2026, 6:34 AM130.9s1 file scanned

SKILL.md

namespace:: aiwg
name:: metrics-tokens
platforms:: [all]
description:: Analyze token usage efficiency against the MetaGPT baseline and surface per-step optimization opportunities

metrics-tokens

Triggers

Alternate expressions and non-obvious activations (primary phrases are matched automatically from the skill description):

"how efficient are my tokens" → efficiency ratio vs MetaGPT baseline
"am I above the baseline" → threshold status check
"where are tokens being wasted" → per-step breakdown with recommendations
"token ratio" → tokens/line ratio calculation

Trigger Patterns Reference

Behavior

When triggered:

Determine scope:
- Default: current or most recent session
- --session <name>: named session
- --all: aggregate across all sessions
Load token data:
- Read .aiwg/ralph/sessions/*/metrics.json for raw token counts
- Apply estimation heuristic: 4 chars per token (aligned with src/metrics/token-counter.ts)
Compute efficiency metrics:
- Tokens/line ratio for session output
- vsBenchmark: percentage vs MetaGPT 124 tokens/line (negative = better)
- vsBaseline: percentage vs typical LLM 200 tokens/line (negative = better)
- Threshold status: green (≤124), yellow (125–150), red (>150)

Run the command:

# Default efficiency report
aiwg metrics-tokens

# Current session
aiwg metrics-tokens --session current

# Per-step breakdown
aiwg metrics-tokens --by-step

# With optimization suggestions
aiwg metrics-tokens --optimize

# JSON output
aiwg metrics-tokens --json

Benchmark Reference

Comparison points:

| Baseline | Tokens/Line | |----------|-------------| | MetaGPT benchmark (REF-013) | 124 | | Typical LLM baseline | ~200 | | AIWG target | ≤ 124 |

Report Format

Standard Efficiency Report

Token Efficiency — Session: sdlc-review-20260401-143022
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Token Counts
  Input:    42,310 tokens
  Output:   18,940 tokens
  Total:    61,250 tokens

Content Metrics
  Characters:     245,000
  Non-blank lines:    548
  Total lines:        621

Efficiency
  Tokens/line:    112
  vs MetaGPT:     -9.7%  (better than 124 tokens/line benchmark)
  vs LLM baseline: -44%  (well below 200 tokens/line typical)
  Status:         green

Threshold: green — at or below MetaGPT benchmark

Per-Step Breakdown (`--by-step`)

Token Efficiency by Step
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Step                    Tokens    Lines  Tokens/Line  Status
──────────────────────  ────────  ─────  ───────────  ──────
architecture-designer   18,200    168    108          green
security-architect      14,600    132    111          green
test-architect          13,100    119    110          green
technical-writer        15,350    129    119          green  ← highest volume
                        ──────────────────────────────────
Total                   61,250    548    112          green

Optimization Report (`--optimize`)

Optimization Suggestions
━━━━━━━━━━━━━━━━━━━━━━━━

Status: green — no critical optimizations needed.

Opportunities (optional):
  1. technical-writer (119 tok/line) — near benchmark ceiling.
     Consider: scope the synthesis prompt to final merge only,
     avoid re-reading full drafts.

  2. architecture-designer (18,200 tokens) — highest absolute cost.
     Consider: pass only the relevant SAD section, not the full doc.

Efficiency Calculation

Token efficiency uses the estimation and comparison logic from src/metrics/token-counter.ts:

tokens          = ceil(characters / 4)
tokensPerLine   = tokens / nonBlankLines
vsBenchmark     = (tokensPerLine - 124) / 124 * 100   (negative = better)
vsBaseline      = (tokensPerLine - 200) / 200 * 100   (negative = better)

Examples

Example 1: Quick efficiency check

User: "Token efficiency for this session"

Action:

aiwg metrics-tokens

Response: Efficiency report with tokens/line ratio, benchmark comparison, and green/yellow/red status.

Example 2: Identify expensive steps

User: "Which step used the most tokens?"

Action:

aiwg metrics-tokens --by-step

Response: Per-step table showing token counts, line counts, tokens/line ratio, and threshold status for each workflow step.

Example 3: Optimization pass

User: "Suggest ways to reduce token usage"

Action:

aiwg metrics-tokens --optimize

Response: Optimization suggestions targeted at steps above the green threshold, with specific prompt-scoping recommendations.

Example 4: Are we at green?

User: "Are we at green on token efficiency?"

Extraction: Threshold check

Action:

aiwg metrics-tokens --threshold

Response: "Threshold status: green — 112 tokens/line, 9.7% below the MetaGPT 124 tokens/line benchmark (REF-013)."

Clarification Prompts

If the session scope is unclear:

"Should I analyze the current running session or the most recent completed session?"

References

@$AIWG_ROOT/src/cli/handlers/subcommands.ts — Metrics tokens handler
@$AIWG_ROOT/src/metrics/token-counter.ts — Token counting, MetaGPT baseline constants (REF-013)
@$AIWG_ROOT/agentic/code/frameworks/sdlc-complete/schemas/flows/token-efficiency.yaml — Token efficiency schema
@$AIWG_ROOT/docs/cli-reference.md — CLI reference

Related Skills

jmagly/radar-status

data-ai

VerifiedTrustedCommunity

Report which research-corpus radar sidecars are overdue for refresh. Computes staleness (days since last refresh vs the cadence window) for every radar, sorted most-overdue-first. Runs via `aiwg corpus radar-status`.

140SKILL.mdUpdated May 28, 2026

jmagly/radar-report

data-ai

VerifiedTrustedCommunity

Aggregate research-corpus radar sidecars into a corpus or per-cluster freshness report — totals, overdue count, per-cluster / per-GRADE / per-trajectory breakdowns, an overdue table, and per-radar rationale snippets. Runs via `aiwg corpus radar-report`.

140SKILL.mdUpdated May 28, 2026

jmagly/radar-init

testing

VerifiedTrustedCommunity

Scaffold radar/freshness sidecars for research-corpus REFs. Pulls title/authors from the citation sidecar and GRADE from the analysis doc, defaults the refresh cadence from GRADE and the cluster from a corpus-local map, and stamps documentation/radar/REF-XXX-radar.md. Runs via `aiwg corpus radar-init`.

140SKILL.mdUpdated May 28, 2026

jmagly/profile-temporal

data-ai

VerifiedTrustedCommunity

Compute an entity's publication trajectory — per-year paper counts, topic drift, hot-streak detection (≥3 consecutive A-grade years), and career phase. Runs via `aiwg corpus profile-temporal`.

140SKILL.mdUpdated May 28, 2026

jmagly/profile-temporal

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/jmagly/aiwg.git

# Copy into Claude Code skills folder (global)
cp -r aiwg/plugins/utils/skills/metrics-tokens ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

jmagly/aiwg

130 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

jmagly/metrics-tokens

$ install --global

Security Scan Results

SKILL.md

metrics-tokens

Triggers

Trigger Patterns Reference

Behavior

Benchmark Reference

Report Format

Standard Efficiency Report

Per-Step Breakdown (--by-step)

Optimization Report (--optimize)

Efficiency Calculation

Examples

Example 1: Quick efficiency check

Example 2: Identify expensive steps

Example 3: Optimization pass

Example 4: Are we at green?

Clarification Prompts

References

Related Skills

jmagly/radar-status

jmagly/radar-report

jmagly/radar-init

jmagly/profile-temporal

jmagly/metrics-tokens

$ install --global

Security Scan Results

SKILL.md

metrics-tokens

Triggers

Trigger Patterns Reference

Behavior

Benchmark Reference

Report Format

Standard Efficiency Report

Per-Step Breakdown (--by-step)

Optimization Report (--optimize)

Efficiency Calculation

Examples

Example 1: Quick efficiency check

Example 2: Identify expensive steps

Example 3: Optimization pass

Example 4: Are we at green?

Clarification Prompts

References

Related Skills

jmagly/radar-status

jmagly/radar-report

jmagly/radar-init

jmagly/profile-temporal

Per-Step Breakdown (`--by-step`)

Optimization Report (`--optimize`)

Per-Step Breakdown (`--by-step`)

Optimization Report (`--optimize`)