Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

curiositech/skill-grader

Name: skill-grader
Author: curiositech

skills/skill-grader/SKILL.md

npx skillsauth add curiositech/windags-skills skill-grader

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Skill Grader

Structured evaluation rubric for Claude Agent Skills. Produces letter grades (A+ through F) on 10 axes plus an overall grade, with specific improvement recommendations for each axis.

Designed for sub-agents and non-expert reviewers who need a mechanical, repeatable process for assessing skill quality without deep domain expertise.

When to Use

✅ Use for:

Auditing a single skill's quality
Comparing skills against each other
Prioritizing which skills to improve first
Quality control sweeps across a skill library
Generating improvement roadmaps

❌ NOT for:

Creating new skills (use skill-architect)
Grading code quality or non-skill documents
Evaluating agent performance (different from skill quality)

Grading Process

flowchart TD
  A[Read SKILL.md + all files] --> B[Score each of 10 axes]
  B --> C[Assign letter grade per axis]
  C --> D[Compute overall grade]
  D --> E[Write improvement recommendations]
  E --> F[Produce grading report]

Step-by-Step

Read the entire skill folder — SKILL.md, all references, scripts, CHANGELOG, README
Score each axis — Use the rubric below (0-100 per axis)
Convert to letter grade — See grade scale
Compute overall grade — Weighted average (Description and Scope are 2x weight)
Write 1-3 specific improvements per axis scoring below B+
Produce the grading report in the output format below

The 10 Evaluation Axes

Axis 1: Description Quality (Weight: 2x)

Does the description follow [What] [When] [Keywords]. NOT for [Exclusions]?

| Grade | Criteria | |-------|----------| | A | Specific verb+noun, domain keywords users would type, 2-5 explicit NOT exclusions, 25-50 words | | B | Has keywords and NOT clause, but slightly vague or missing synonym coverage | | C | Too generic, missing NOT clause, or >100 words of process detail | | D | Single vague sentence ("helps with X") or name/description mismatch | | F | Missing or empty description |

Axis 2: Scope Discipline (Weight: 2x)

Is the skill narrowly focused on one expertise type, or a catch-all?

| Grade | Criteria | |-------|----------| | A | One clear expertise domain, "When to Use" and "NOT for" sections both present and specific | | B | Mostly focused, minor boundary ambiguity | | C | Covers 2-3 related but distinct domains, should probably be split | | D | Catch-all skill ("helps with anything related to X") | | F | No scope boundaries defined at all |

Axis 3: Progressive Disclosure

Does the skill follow the three-layer architecture (metadata → SKILL.md → references)?

| Grade | Criteria | |-------|----------| | A | SKILL.md <300 lines, heavy content in references, reference index in SKILL.md with 1-line descriptions | | B | SKILL.md <500 lines, some references used, index present | | C | SKILL.md >500 lines, or all content inlined with no references | | D | SKILL.md >800 lines, or references exist but aren't indexed in SKILL.md | | F | Single massive file with no structure |

Axis 4: Anti-Pattern Coverage

Does the skill encode expert knowledge that prevents common mistakes?

| Grade | Criteria | |-------|----------| | A | 3+ anti-patterns with Novice/Expert/Timeline template, LLM-mistake notes | | B | 1-2 anti-patterns with clear explanation | | C | Anti-patterns mentioned but no structured template | | D | No anti-patterns, just positive instructions | | F | Contains advice that IS an anti-pattern (outdated, harmful) |

Axis 5: Self-Contained Tools

Does the skill include working tools (scripts, MCPs, subagents)?

| Grade | Criteria | |-------|----------| | A | Working scripts with CLI interface, error handling, dependency docs; OR valid "no tools needed" justification | | B | Scripts exist and work but lack error handling or docs | | C | Scripts referenced but are templates/pseudocode | | D | Phantom tools (SKILL.md references files that don't exist) | | F | References non-existent tools AND no acknowledgment |

Note: Not every skill needs tools. A pure decision-tree skill can score A if tools aren't applicable.

Axis 6: Activation Precision

Would the skill activate correctly on relevant queries and stay silent on irrelevant ones?

| Grade | Criteria | |-------|----------| | A | Description has specific keywords matching user language, clear NOT clause, no obvious false-positive vectors | | B | Good keywords, minor false-positive risk | | C | Generic keywords that overlap with other skills | | D | No specific keywords, or NOT clause contradicts intended use | | F | Description would cause constant false activation |

Axis 7: Visual Artifacts

Does the skill use Mermaid diagrams, code examples, and tables effectively?

| Grade | Criteria | |-------|----------| | A | Decision trees as Mermaid flowcharts, tables for comparisons, code examples for concrete patterns | | B | Some diagrams or tables, but key decision trees still in prose | | C | Tables used but no Mermaid diagrams for processes | | D | Prose-only, no visual structure | | F | Wall of text with no formatting aids |

Axis 8: Output Contracts

Does the skill define what it produces in a format consumable by other agents?

| Grade | Criteria | |-------|----------| | A | Explicit output format (JSON schema, markdown template, or structured sections), subagent-consumable | | B | Output format implied but not explicitly documented | | C | No output format, but content is structured enough to infer | | D | Unstructured prose output expected | | F | N/A (pure reference skill) — exempt from this axis |

Axis 9: Temporal Awareness

Does the skill track when knowledge was current and what has changed?

| Grade | Criteria | |-------|----------| | A | Timelines in anti-patterns, "as of [date]" markers, CHANGELOG with dates | | B | Some temporal context, CHANGELOG exists | | C | No dates on knowledge, but CHANGELOG exists | | D | No temporal context anywhere, knowledge could be stale | | F | Contains demonstrably outdated advice without warning |

Axis 10: Documentation Quality

README, CHANGELOG, and reference organization.

| Grade | Criteria | |-------|----------| | A | README with quick start, CHANGELOG with dated versions, references well-organized with clear filenames | | B | README and CHANGELOG exist, references present | | C | SKILL.md is the only file, but it's well-structured | | D | No README, no CHANGELOG, disorganized references | | F | SKILL.md is the only file and it's poorly structured |

Grade Scale

| Letter | Score Range | Meaning | |--------|------------|---------| | A+ | 97-100 | Exemplary — sets the standard | | A | 93-96 | Excellent — minor improvements possible | | A- | 90-92 | Very good — a few small gaps | | B+ | 87-89 | Good — notable room for improvement | | B | 83-86 | Solid — several areas need work | | B- | 80-82 | Above average — meaningful gaps | | C+ | 77-79 | Average — significant improvements needed | | C | 73-76 | Below average — major gaps | | C- | 70-72 | Barely adequate | | D+ | 67-69 | Poor — fundamental issues | | D | 63-66 | Very poor — needs major rework | | D- | 60-62 | Near-failing quality | | F | <60 | Failing — start over |

Overall Grade Computation

Axes 1 (Description) and 2 (Scope) carry 2x weight. All others carry 1x weight. If Axis 8 (Output Contracts) is marked exempt, remove it from the calculation.

Overall = (2×Axis1 + 2×Axis2 + Axis3 + Axis4 + Axis5 + Axis6 + Axis7 + Axis8 + Axis9 + Axis10) / 12

Convert the numeric average to a letter grade using the scale above.

Output Format

Produce this exact structure:

# Skill Grading Report: [skill-name]

**Graded**: [date]
**Overall Grade**: [letter] ([score]/100)

## Axis Grades

| # | Axis | Grade | Score | Key Finding |
|---|------|-------|-------|-------------|
| 1 | Description Quality | [grade] | [score] | [1-line finding] |
| 2 | Scope Discipline | [grade] | [score] | [1-line finding] |
| 3 | Progressive Disclosure | [grade] | [score] | [1-line finding] |
| 4 | Anti-Pattern Coverage | [grade] | [score] | [1-line finding] |
| 5 | Self-Contained Tools | [grade] | [score] | [1-line finding] |
| 6 | Activation Precision | [grade] | [score] | [1-line finding] |
| 7 | Visual Artifacts | [grade] | [score] | [1-line finding] |
| 8 | Output Contracts | [grade] | [score] | [1-line finding] |
| 9 | Temporal Awareness | [grade] | [score] | [1-line finding] |
| 10 | Documentation Quality | [grade] | [score] | [1-line finding] |

## Top 3 Improvements (Highest Impact)

1. **[Axis]: [Specific action]** — [Why this matters, expected grade improvement]
2. **[Axis]: [Specific action]** — [Why this matters, expected grade improvement]
3. **[Axis]: [Specific action]** — [Why this matters, expected grade improvement]

## Detailed Notes

### [Axis name] ([grade])
[2-3 sentences of specific feedback with examples from the skill]

[Repeat for each axis scoring below B+]

Quick Grading (Abbreviated)

For rapid triage across many skills, produce only:

| Skill | Overall | Desc | Scope | Disc | Anti | Tools | Activ | Visual | Output | Temp | Docs |
|-------|---------|------|-------|------|------|-------|-------|--------|--------|------|------|
| [name] | [grade] | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |

Anti-Patterns in Grading

Grade Inflation

Wrong: Giving B+ because "it's pretty good" without checking criteria. Right: Match observations to the rubric table literally. If the description lacks a NOT clause, it cannot score above C on Axis 1.

Missing Context

Wrong: Grading a pure decision-tree skill poorly on Axis 5 (tools) because it has no scripts. Right: Mark Axis 5 as "A — tools not applicable for this skill type."

Ignoring Phantoms

Wrong: Scoring Axis 5 as B because scripts are "referenced." Right: Actually check if every referenced file exists. If scripts/validate.py is mentioned but doesn't exist, that's D.

Output Contract

This skill produces:

10-axis grade report with letter grades (A+ through F) and weighted total
Per-axis feedback with specific evidence and improvement suggestions
Priority improvements ranked by impact on overall quality score
Before/after comparison projecting grade changes from recommended fixes

curiositech/skill-grader

skills/skill-grader/SKILL.md

Evaluates Claude Agent Skills on 10 quality axes with letter grades (A+ through F) and specific improvement recommendations. Use when auditing a skill, comparing skills, prioritizing improvements, or performing quality control on a skill library. Activate on "grade skill", "evaluate skill", "skill quality", "skill audit", "skill review", "rate skill". NOT for creating skills (use skill-architect), grading code quality, or evaluating non-skill documents.

development

Updated Apr 4, 2026

$ install --global

skillsauth

npx skillsauth add curiositech/windags-skills skill-grader

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 4, 2026, 2:52 PM23.6s2 files scanned

SKILL.md

name:: skill-grader
license:: Apache-2.0
description:: Evaluates Claude Agent Skills on 10 quality axes with letter grades (A+ through F) and specific improvement recommendations. Use when auditing a skill, comparing skills, prioritizing improvements, or performing quality control on a skill library. Activate on "grade skill", "evaluate skill", "skill quality", "skill audit", "skill review", "rate skill". NOT for creating skills (use skill-architect), grading code quality, or evaluating non-skill documents.
allowed-tools:: Read,Grep,Glob
argument-hint:: [skill-path]
category:: Agent & Orchestration
- skill:: skill-coach
reason:: Grading provides objective scores that coaching uses to prioritize improvement areas

Skill Grader

Structured evaluation rubric for Claude Agent Skills. Produces letter grades (A+ through F) on 10 axes plus an overall grade, with specific improvement recommendations for each axis.

Designed for sub-agents and non-expert reviewers who need a mechanical, repeatable process for assessing skill quality without deep domain expertise.

When to Use

✅ Use for:

Auditing a single skill's quality
Comparing skills against each other
Prioritizing which skills to improve first
Quality control sweeps across a skill library
Generating improvement roadmaps

❌ NOT for:

Creating new skills (use skill-architect)
Grading code quality or non-skill documents
Evaluating agent performance (different from skill quality)

Grading Process

flowchart TD
  A[Read SKILL.md + all files] --> B[Score each of 10 axes]
  B --> C[Assign letter grade per axis]
  C --> D[Compute overall grade]
  D --> E[Write improvement recommendations]
  E --> F[Produce grading report]

Step-by-Step

Read the entire skill folder — SKILL.md, all references, scripts, CHANGELOG, README
Score each axis — Use the rubric below (0-100 per axis)
Convert to letter grade — See grade scale
Compute overall grade — Weighted average (Description and Scope are 2x weight)
Write 1-3 specific improvements per axis scoring below B+
Produce the grading report in the output format below

The 10 Evaluation Axes

Axis 1: Description Quality (Weight: 2x)

Does the description follow [What] [When] [Keywords]. NOT for [Exclusions]?

Axis 2: Scope Discipline (Weight: 2x)

Is the skill narrowly focused on one expertise type, or a catch-all?

Axis 3: Progressive Disclosure

Does the skill follow the three-layer architecture (metadata → SKILL.md → references)?

Axis 4: Anti-Pattern Coverage

Does the skill encode expert knowledge that prevents common mistakes?

Axis 5: Self-Contained Tools

Does the skill include working tools (scripts, MCPs, subagents)?

Note: Not every skill needs tools. A pure decision-tree skill can score A if tools aren't applicable.

Axis 6: Activation Precision

Would the skill activate correctly on relevant queries and stay silent on irrelevant ones?

Axis 7: Visual Artifacts

Does the skill use Mermaid diagrams, code examples, and tables effectively?

Axis 8: Output Contracts

Does the skill define what it produces in a format consumable by other agents?

Axis 9: Temporal Awareness

Does the skill track when knowledge was current and what has changed?

Axis 10: Documentation Quality

README, CHANGELOG, and reference organization.

Grade Scale

Overall Grade Computation

Axes 1 (Description) and 2 (Scope) carry 2x weight. All others carry 1x weight. If Axis 8 (Output Contracts) is marked exempt, remove it from the calculation.

Overall = (2×Axis1 + 2×Axis2 + Axis3 + Axis4 + Axis5 + Axis6 + Axis7 + Axis8 + Axis9 + Axis10) / 12

Convert the numeric average to a letter grade using the scale above.

Output Format

Produce this exact structure:

# Skill Grading Report: [skill-name]

**Graded**: [date]
**Overall Grade**: [letter] ([score]/100)

## Axis Grades

| # | Axis | Grade | Score | Key Finding |
|---|------|-------|-------|-------------|
| 1 | Description Quality | [grade] | [score] | [1-line finding] |
| 2 | Scope Discipline | [grade] | [score] | [1-line finding] |
| 3 | Progressive Disclosure | [grade] | [score] | [1-line finding] |
| 4 | Anti-Pattern Coverage | [grade] | [score] | [1-line finding] |
| 5 | Self-Contained Tools | [grade] | [score] | [1-line finding] |
| 6 | Activation Precision | [grade] | [score] | [1-line finding] |
| 7 | Visual Artifacts | [grade] | [score] | [1-line finding] |
| 8 | Output Contracts | [grade] | [score] | [1-line finding] |
| 9 | Temporal Awareness | [grade] | [score] | [1-line finding] |
| 10 | Documentation Quality | [grade] | [score] | [1-line finding] |

## Top 3 Improvements (Highest Impact)

1. **[Axis]: [Specific action]** — [Why this matters, expected grade improvement]
2. **[Axis]: [Specific action]** — [Why this matters, expected grade improvement]
3. **[Axis]: [Specific action]** — [Why this matters, expected grade improvement]

## Detailed Notes

### [Axis name] ([grade])
[2-3 sentences of specific feedback with examples from the skill]

[Repeat for each axis scoring below B+]

Quick Grading (Abbreviated)

For rapid triage across many skills, produce only:

| Skill | Overall | Desc | Scope | Disc | Anti | Tools | Activ | Visual | Output | Temp | Docs |
|-------|---------|------|-------|------|------|-------|-------|--------|--------|------|------|
| [name] | [grade] | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |

Anti-Patterns in Grading

Grade Inflation

Missing Context

Wrong: Grading a pure decision-tree skill poorly on Axis 5 (tools) because it has no scripts. Right: Mark Axis 5 as "A — tools not applicable for this skill type."

Ignoring Phantoms

Wrong: Scoring Axis 5 as B because scripts are "referenced." Right: Actually check if every referenced file exists. If scripts/validate.py is mentioned but doesn't exist, that's D.

Output Contract

This skill produces:

10-axis grade report with letter grades (A+ through F) and weighted total
Per-axis feedback with specific evidence and improvement suggestions
Priority improvements ranked by impact on overall quality score
Before/after comparison projecting grade changes from recommended fixes

Related Skills

curiositech/revisiting-interview-data-analysing-turn

data-ai

VerifiedTrustedCommunity

license: Apache-2.0 NOT for unrelated tasks outside this domain.

8SKILL.mdUpdated Jul 19, 2026

curiositech/revisiting-interview-data-analysing-turn

curiositech/redis-patterns-expert

development

VerifiedTrustedCommunity

Use when designing caching strategies (cache-aside, write-through, write-behind), implementing distributed locks, building rate limiters, leaderboards, real-time streams (XADD/consumer groups), pub/sub, or tuning eviction policies. Triggers: thundering-herd on cache miss, dogpile on key expiry, Redlock vs SET-NX-PX choice, sliding-window rate limiter, hot-key on a single cluster slot, big-key blowup, MULTI/EXEC across slots, KEYS in production. NOT for Redis Cluster operations/admin (different domain), embedded KV (SQLite, leveldb), in-process LRU caches, or Memcached.

8SKILL.mdUpdated Jul 19, 2026

curiositech/redis-patterns-expert

curiositech/react-server-components-boundary

tools

VerifiedTrustedCommunity

Drawing the `'use client'` boundary correctly in React Server Components apps (Next.js App Router, RSC frameworks) — leaf-pushing, slot composition, serialization rules, and environment poisoning prevention. Grounded in react.dev and Next.js 16 docs.

8SKILL.mdUpdated Jul 19, 2026

curiositech/react-server-components-boundary

curiositech/rate-limiting-strategy

development

VerifiedTrustedCommunity

Use when designing rate limiting for an API, choosing between token bucket / sliding window / leaky bucket / fixed window, implementing it in Redis, deciding edge (Cloudflare/Upstash) vs origin enforcement, sizing per-user vs per-IP vs per-endpoint quotas, returning the right 429 response with Retry-After, or fixing the boundary-burst bug in fixed-window limiters. Triggers: 429 too many requests, INCR + EXPIRE, ZADD + ZREMRANGEBYSCORE + ZCARD, X-RateLimit-Remaining header, Cloudflare WAF rate limiting rules, Upstash @upstash/ratelimit, leaky bucket shaping vs policing, distributed rate limiter consistency. NOT for DDoS mitigation specifically (different scale), CAPTCHA / bot management, full WAF design, or per-user quota billing.

8SKILL.mdUpdated Jul 19, 2026

curiositech/rate-limiting-strategy

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/curiositech/windags-skills.git

# Copy into Claude Code skills folder (global)
cp -r windags-skills/skills/skill-grader ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

curiositech/windags-skills

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT