Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

jaisonerick/assess-effectiveness

Name: assess-effectiveness
Author: jaisonerick

plugins/spec-plugin/skills/assess-effectiveness/SKILL.md

npx skillsauth add jaisonerick/spec-plugin assess-effectiveness

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

VER="$ARGUMENTS[0]"; case "$VER" in v[0-9]*) VA="--version $VER" ;; *) VA="" ;; esac
python3 "${CLAUDE_SKILL_DIR}/bin/run-metrics.py" $VA 2>/dev/null || echo "(metrics did not run — invoke manually: python3 \"\${CLAUDE_SKILL_DIR}/bin/run-metrics.py\")"

You assess how the latest spec-plugin version performs across all the sessions that invoked it — not one run — and propose evidence-backed improvements for the next version: primarily to the plugin (skills/agents/hooks), secondarily to spec/process patterns. The cross-session metrics for the target version are preloaded above.

You propose, never self-modify — your output is a review a human acts on. (This is the safe replacement for the removed self-modifying retrospective.)

What's preloaded

The preload ran bin/run-metrics.py for the latest installed spec-plugin version: every transcript across ~/.claude/projects that loaded a spec-plugin skill at that version, with per-session turns · output-k · thinking% · compactions · explore-Bash:Skill · preload-fired · which-skill, plus an aggregate and the v11 baseline. Re-run with a wider window / pinned version when useful:

python3 "${CLAUDE_SKILL_DIR}/bin/run-metrics.py" [--version vNN] [--since "YYYY-MM-DD HH:MM"] [--all] [--dir <projects-subdir>]

Three lenses — aggregate, systemic-not-anecdotal

A single bad session is noise; a pattern across the version's sessions is signal.

1. Run efficiency

From the metrics, vs the baseline the script prints:

median thinking% still high → effort miscalibrated for a role tier.
compactions > ~0 on any session → stories too big, or context bloating somewhere.
explore-Bash : Skill still high → the no-grep nudge + forked primitives aren't biting; identify which role/skill is the offender (the which-skill column).
preload-fired < 100% of execute-task/validate-execution sessions → the skill sometimes isn't invoked, or the preload is failing — investigate the misses (a real gap we've hit before).
giant single sessions / many stories in one → fresh-per-story not honored by the lead.

2. Process adherence — across the runs

How often the design actually held: skills invoked (appearing in the set means it did), preloads fired, exploration delegated, fresh-per-story, peer-to-peer QA. A low rate is a plugin or driving problem — separate a spec/plugin bug from a driving artifact (e.g. a resume run driven in the old persistent style doesn't fault the plugin).

3. Recurring spec-quality issues — sampled, secondary

This view spans versions, so there's no single DoD to grade. Instead sample the heaviest/most-anomalous sessions' lessons.md / qa/dod-audit.md (find them via the session's project/cwd) for issues that recur across runs — architecture↔as-built drift, DoD items that needed a REV, oversized stories. A recurring spec issue is a candidate for a build-stories / architect-version guidance change.

How you work — delegate, stay lean

Do not read every transcript in your own context. Delegate to sub-agents (run them in parallel) and synthesize their compact findings:

An analysis agent on the 2–3 heaviest / most-anomalous sessions the metrics flag — where the turns/tokens went, biggest single turns, repeated re-reads, grep-spelunking the nudge didn't stop. Point it at the session files; it returns findings, never raw transcripts.
A pattern agent to scan the set's lessons.md / qa for recurring spec-quality + adherence issues (lenses 2–3).

Output — the version effectiveness review

Write spec-plugin-effectiveness-<version>.md in the CWD, and report a tight summary to the caller:

Headline — how vN performs vs the v11 baseline (and the prior version, if known) across the five metrics.
Systemic findings — the cross-session patterns, each with the metric / offending sessions as evidence.
Process adherence — what held vs leaked across runs; plugin-bug vs driving-artifact.
Recurring spec issues — if any.
Proposed improvements for the NEXT version — prioritized:
- A · Plugin (skills/agents/hooks): concrete edit — file + rationale + the metric it should move.
- B · Spec/process guidance: sizing, DoD, self-containment patterns worth encoding. Each proposal: evidence (metric / session / quoted line) → change → expected effect. Do not apply any of them.

Constraints

Propose, never self-modify — write the review; don't edit the plugin or any spec.
Systemic over anecdotal — one bad session is noise; a cross-session pattern is signal.
Lean — delegate the mining to sub-agents; never dump raw transcripts into your own context.

jaisonerick/assess-effectiveness

plugins/spec-plugin/skills/assess-effectiveness/SKILL.md

Assess how the LATEST spec-plugin version is performing across every previous session that invoked it — aggregate run efficiency (thinking%, compactions, exploration-vs-skills, preload firing, fresh-per-story), process adherence, and recurring spec-quality issues — then propose concrete, evidence-backed improvements for the NEXT version (plugin skills/agents/hooks, and spec/process patterns). Read-only: proposes, never self-modifies. Not tied to a single run.

tools

Updated Jun 9, 2026

$ install --global

skillsauth

npx skillsauth add jaisonerick/spec-plugin assess-effectiveness

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 9, 2026, 8:24 AM296.3s2 files scanned

SKILL.md

name:: assess-effectiveness
description:: Assess how the LATEST spec-plugin version is performing across every previous session that invoked it — aggregate run efficiency (thinking%, compactions, exploration-vs-skills, preload firing, fresh-per-story), process adherence, and recurring spec-quality issues — then propose concrete, evidence-backed improvements for the NEXT version (plugin skills/agents/hooks, and spec/process patterns). Read-only: proposes, never self-modifies. Not tied to a single run.
argument-hint:: [optional: plugin version, e.g. v12 — defaults to latest installed]
effort:: high
allowed-tools:: Read, Glob, Grep, Bash, Agent, Write

VER="$ARGUMENTS[0]"; case "$VER" in v[0-9]*) VA="--version $VER" ;; *) VA="" ;; esac
python3 "${CLAUDE_SKILL_DIR}/bin/run-metrics.py" $VA 2>/dev/null || echo "(metrics did not run — invoke manually: python3 \"\${CLAUDE_SKILL_DIR}/bin/run-metrics.py\")"

You propose, never self-modify — your output is a review a human acts on. (This is the safe replacement for the removed self-modifying retrospective.)

What's preloaded

python3 "${CLAUDE_SKILL_DIR}/bin/run-metrics.py" [--version vNN] [--since "YYYY-MM-DD HH:MM"] [--all] [--dir <projects-subdir>]

Three lenses — aggregate, systemic-not-anecdotal

A single bad session is noise; a pattern across the version's sessions is signal.

1. Run efficiency

From the metrics, vs the baseline the script prints:

median thinking% still high → effort miscalibrated for a role tier.
compactions > ~0 on any session → stories too big, or context bloating somewhere.
explore-Bash : Skill still high → the no-grep nudge + forked primitives aren't biting; identify which role/skill is the offender (the which-skill column).
preload-fired < 100% of execute-task/validate-execution sessions → the skill sometimes isn't invoked, or the preload is failing — investigate the misses (a real gap we've hit before).
giant single sessions / many stories in one → fresh-per-story not honored by the lead.

2. Process adherence — across the runs

3. Recurring spec-quality issues — sampled, secondary

How you work — delegate, stay lean

Do not read every transcript in your own context. Delegate to sub-agents (run them in parallel) and synthesize their compact findings:

An analysis agent on the 2–3 heaviest / most-anomalous sessions the metrics flag — where the turns/tokens went, biggest single turns, repeated re-reads, grep-spelunking the nudge didn't stop. Point it at the session files; it returns findings, never raw transcripts.
A pattern agent to scan the set's lessons.md / qa for recurring spec-quality + adherence issues (lenses 2–3).

Output — the version effectiveness review

Write spec-plugin-effectiveness-<version>.md in the CWD, and report a tight summary to the caller:

Headline — how vN performs vs the v11 baseline (and the prior version, if known) across the five metrics.
Systemic findings — the cross-session patterns, each with the metric / offending sessions as evidence.
Process adherence — what held vs leaked across runs; plugin-bug vs driving-artifact.
Recurring spec issues — if any.
Proposed improvements for the NEXT version — prioritized:
- A · Plugin (skills/agents/hooks): concrete edit — file + rationale + the metric it should move.
- B · Spec/process guidance: sizing, DoD, self-containment patterns worth encoding. Each proposal: evidence (metric / session / quoted line) → change → expected effect. Do not apply any of them.

Constraints

Propose, never self-modify — write the review; don't edit the plugin or any spec.
Systemic over anecdotal — one bad session is noise; a cross-session pattern is signal.
Lean — delegate the mining to sub-agents; never dump raw transcripts into your own context.

Related Skills

jaisonerick/verify-symbol

development

VerifiedTrustedCommunity

Confirm whether a code symbol (method/class/field/endpoint/flag) actually exists and return its REAL signature + definition location — or the nearest match. Uses LSP/introspection, never grep-spelunking. Cheap and fast.

SKILL.mdUpdated Jun 7, 2026

jaisonerick/verify-symbol

jaisonerick/trace-flow

development

VerifiedTrustedCommunity

Walk one value or action end-to-end across every layer/hop — go-to-definition by go-to-definition, or with a debugger breakpoint — and report the real state transitions and where the contract/shape diverges. The workhorse for architecture sketches and cross-layer debugging.

SKILL.mdUpdated Jun 7, 2026

jaisonerick/trace-flow

jaisonerick/setup-env

testing

VerifiedTrustedCommunity

Bring a fresh worktree/checkout to a runnable state — verify base HEAD, copy gitignored files (.env), allocate per-agent DB/test env, install deps, run the smoke gate. Deterministic, mechanical. Reports a single ready/blocked verdict.

SKILL.mdUpdated Jun 7, 2026

jaisonerick/setup-env

jaisonerick/probe-contract

development

VerifiedTrustedCommunity

Find out how code ACTUALLY behaves by executing the real classes in a REPL — the real request/response shape, the real return value — without relying on a live staging environment. Beats writing a test for 'how does this behave?'.

SKILL.mdUpdated Jun 7, 2026

jaisonerick/probe-contract

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/jaisonerick/spec-plugin.git

# Copy into Claude Code skills folder (global)
cp -r spec-plugin/plugins/spec-plugin/skills/assess-effectiveness ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

jaisonerick/spec-plugin

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT