skills/rate-skill/SKILL.md
Grades a SKILL.md A-F with prioritized, paste-ready fixes. Use whenever the user asks to "rate this skill", "grade this skill", "audit my SKILL.md", or "score this skill". Do NOT use for code review or for new skills (see generate-skill).
npx skillsauth add antjanus/skillbox rate-skillInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Audits a single SKILL.md against current activation-driven authoring practice and returns a letter grade, weighted category scores, prioritized findings with concrete patches, named strengths, and a projected grade after fixes.
The rubric is anchored to (a) the agentskills.io / Claude Code frontmatter spec, (b) Anthropic's skill-creator guidance, and (c) Seleznov's activation study (n=650, p<0.0001) showing directive third-person descriptions activate ~20× more reliably than passive prose.
SKILL.md inside it. If nothing is passed, ask once: "Which SKILL.md should I rate?"| # | Category | Weight | |---|---|---| | 1 | Description quality | 25 | | 2 | Frontmatter validity | 20 | | 3 | Length & progressive disclosure | 15 | | 4 | Structure fit for type | 15 | | 5 | Examples | 10 | | 6 | Conciseness / token economy | 10 | | 7 | Anti-pattern avoidance | 5 |
Letter mapping: A 90–100, B 80–89, C 70–79, D 60–69, F <60.
Full marks require all five:
Length scoring:
1024 chars: cap at 50 (spec hard cap violation per agentskills.io).
description: (YAML | or > block scalar): automatic 0. Silently breaks discovery (anthropics/skills #9817).Other hard penalties:
Accepted top-level fields: name, description, license, compatibility, when_to_use, argument-hint, arguments, disable-model-invocation, user-invocable, model, effort, agent, hooks, paths, shell, allowed-tools, metadata.
Deduct 15 per occurrence:
version, author, or tags (belong inside metadata).category (not a real field).argument-hint nested under metadata (must be top-level).name with uppercase, consecutive hyphens, or reserved words (anthropic, claude).500 lines without a
references/directory: cap at 40.
reference/ instead of plural references/: −10 (canonical is plural).SKILL.md: −15 (Claude head -100s files and misses content).Expected sections by detected type:
Missing a section the detected type needs: −20 each. Penalize an ## Integration section that contains nothing concrete (rare in surveyed top skills).
## Anti-Pattern: headers: +15.<Good>/<Bad> XML tags: −10 (zero of 8 surveyed top skills use them).Penalize: paragraphs restating general programming knowledge; "why this matters" prose longer than the rule it precedes; verbose intros before the workflow; inconsistent terminology (e.g., swapping "skill" / "command" for the same thing).
Deduct 20 per occurrence:
skill-creator calls this a yellow flag).README.md, INSTALLATION.md, QUICK_REFERENCE.md, CHANGELOG.md)."if before August 2025…") without being scoped to an "old patterns" section.# Skill Rating: <skill-name>
**Detected type:** <methodology | reference | generator | auditor>
**Overall grade:** <letter> (<weighted score>/100)
## Category scores
| Category | Score | Weight | Weighted |
|---|---|---|---|
| Description quality | nn | 25 | nn.n |
| Frontmatter validity | nn | 20 | nn.n |
| Length & disclosure | nn | 15 | nn.n |
| Structure | nn | 15 | nn.n |
| Examples | nn | 10 | nn.n |
| Conciseness | nn | 10 | nn.n |
| Anti-pattern avoid. | nn | 5 | nn.n |
## Strengths
- <concrete bullet — what the skill does well>
## Findings (prioritized)
### P0 — <title>
**Why:** <one-line rationale, cite category>
**Fix:**
\`\`\`<lang>
<concrete replacement text, not a description of one>
\`\`\`
### P1 — <title>
...
### P2 — <title>
...
## Estimated grade after P0+P1: <letter> (<projected score>/100)
<one-line commit-ready summary>
Every report includes at least one strength (even on F-tier skills — users abandon purely negative reports). Every finding ships a concrete patch the user can paste, not "improve the description".
Worked examples — directive-description rewrite, frontmatter cleanup, and a desired report opener: references/EXAMPLES.md.
description: is a silent killer. A | block scalar parses fine but discovery never sees it (anthropics/skills #9817). Always flag as P0 — automatic 0 on Category 1.skillListingBudgetFraction defaults to ~1% of context — past ~15–25 installed skills, descriptions get silently dropped. Soft target ≤230; no penalty up to 500.tags does nothing functionally at top level. No discovery system consumes it. If found at top level, demote to metadata.tags rather than deleting — preserves user intent.<Good>/<Bad> XML tags are a SkillBox-only convention. Zero of 8 surveyed Anthropic/Vercel/Superpowers skills use them. Recommend ✅/❌ or prose ## Anti-Pattern: headers.reference/ vs plural references/. Anthropic's spec and skill-creator both use plural. Singular works but is off-style — −10, not a P0.skill-creator calls it a yellow flag: "if possible, reframe and explain the reasoning." Recommend "Quality Signals" + "Anti-Patterns".skill-creator (May 2026 update) ships a 60/40 train/test description optimizer. If the rated skill ships its own eval set under eval/ or references/EVAL.md, add +5 to Category 1 (cap 100). Cite: https://claude.com/blog/improving-skill-creator-test-measure-and-refine-agent-skillsdevelopment
EXPERIMENTAL. Mine recent Claude Code transcripts for friction events, cluster them by active skill, propose patches for skills with 3+ friction events, validate each patch via headless replay, scrub the report through /publish-check, and present an EVOLUTION_REPORT.md for human review on a branch (never auto-merge). Use when asked to "evolve my skills", "audit skills against recent friction", "propose skill improvements from transcripts", "run the skill evolution pipeline", or as part of a weekly skill-quality cadence.
testing
Manual QA tracking — things tests can't verify. Use when asked to "create a QA list", "set up QA for this project", "what should I QA", "track manual QA", "audit the QA list", or "start manual QA".
development
Multi-source web research with cited synthesis in chat. Use when asked to "research X", "deep research on Y", "deep dive on Z", "investigate this topic", "compare X and Y", "pros and cons of X", or "survey the landscape of Y".
development
Use this skill whenever the user wants a multi-agent review of local changes — triggers include "review my code", "review these changes", "do a code review", or "check my changes before I commit". Writes REVIEW.md. Do NOT use for an open PR by number (use /review) or a security-specific pass (use /security-review).