skills/skill_evaluator/SKILL.md
Evaluates agent skills against Anthropic's best practices. Use when asked to review, evaluate, assess, or audit a skill for quality. Analyzes SKILL.md structure, naming conventions, description quality, content organization, and identifies anti-patterns. Produces actionable improvement recommendations.
npx skillsauth add vuralserhat86/antigravity-agentic-skills skill_evaluatorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Evaluates skills against Anthropic's official best practices for agent skill authoring. Produces structured evaluation reports with scores and actionable recommendations.
scripts/validate_skill.py <skill-path>Run the validation script first:
scripts/validate_skill.py <path/to/skill>
This checks:
Evaluate each dimension and assign a score (1-5):
| Score | Criteria | |-------|----------| | 5 | Gerund form (-ing), clear purpose, memorable | | 4 | Descriptive, follows conventions | | 3 | Acceptable but could be clearer | | 2 | Vague or misleading | | 1 | Violates naming rules |
Rules: Max 64 chars, lowercase + numbers + hyphens only, no reserved words (anthropic, claude), no XML tags.
Good: processing-pdfs, analyzing-spreadsheets, building-dashboards
Bad: pdf, my-skill, ClaudeHelper, anthropic-tools
| Score | Criteria | |-------|----------| | 5 | Clear functionality + specific activation triggers + third person | | 4 | Good description with some triggers | | 3 | Adequate but missing triggers or vague | | 2 | Too brief or unclear purpose | | 1 | Missing or unhelpful |
Must include: What the skill does AND when to use it. Good: "Extracts text from PDFs. Use when working with PDF documents for text extraction, form parsing, or content analysis." Bad: "A skill for PDFs." or "Helps with documents."
| Score | Criteria | |-------|----------| | 5 | Concise, assumes Claude intelligence, actionable instructions | | 4 | Generally good, minor verbosity | | 3 | Some unnecessary explanations or redundancy | | 2 | Overly verbose or confusing | | 1 | Bloated, explains obvious concepts |
Ask: "Does Claude really need this explanation?" Remove anything Claude already knows.
| Score | Criteria | |-------|----------| | 5 | Excellent progressive disclosure, clear navigation, optimal length | | 4 | Good organization, appropriate file splits | | 3 | Acceptable but could be better organized | | 2 | Poor organization, missing references, or bloated SKILL.md | | 1 | No structure, everything dumped in SKILL.md |
Check:
| Score | Criteria | |-------|----------| | 5 | Perfect match: high freedom for flexible tasks, low for fragile operations | | 4 | Generally appropriate freedom levels | | 3 | Acceptable but could be better calibrated | | 2 | Mismatched: too rigid or too loose | | 1 | Completely wrong freedom level for the task type |
Guideline:
Deduct points for each anti-pattern found:
Use this template:
# Skill Evaluation Report: [skill-name]
## Summary
- **Overall Score**: X.X/5.0
- **Recommendation**: [Ready for publication / Needs minor improvements / Needs major revision]
## Dimension Scores
| Dimension | Score | Weight | Weighted |
|-----------|-------|--------|----------|
| Naming | X/5 | 10% | X.XX |
| Description | X/5 | 20% | X.XX |
| Content Quality | X/5 | 30% | X.XX |
| Structure | X/5 | 25% | X.XX |
| Degrees of Freedom | X/5 | 10% | X.XX |
| Anti-Patterns | X/5 | 5% | X.XX |
| **Total** | | 100% | **X.XX** |
## Strengths
- [List 2-3 things done well]
## Areas for Improvement
- [List specific issues with actionable fixes]
## Anti-Patterns Found
- [List any anti-patterns detected]
## Recommendations
1. [Priority 1 fix]
2. [Priority 2 fix]
3. [Priority 3 fix]
## Pre-Publication Checklist
- [ ] Description is specific with activation triggers
- [ ] SKILL.md under 500 lines
- [ ] One-level-deep file references
- [ ] Forward slashes in all paths
- [ ] No time-sensitive information
- [ ] Consistent terminology
- [ ] Concrete examples provided
- [ ] Scripts handle errors explicitly
- [ ] All configuration values justified
- [ ] Required packages listed
- [ ] Tested with Haiku, Sonnet, Opus
| Score Range | Rating | Action | |-------------|--------|--------| | 4.5 - 5.0 | Excellent | Ready for publication | | 4.0 - 4.4 | Good | Minor improvements recommended | | 3.0 - 3.9 | Acceptable | Several improvements needed | | 2.0 - 2.9 | Needs Work | Major revision required | | 1.0 - 1.9 | Poor | Fundamental redesign needed |
Skill Evaluator v1.1 - Enhanced
Kaynak: Google Engineering Practices - Code Review & Anthropic System Prompts
scripts/, references/) standarta uyuyor mu?name, description) eksiksiz ve valid mi?scripts/ içindeki Python/Bash kodları güvenli ve çalışır durumda mı?references/ dosyaları gerçekten gerekli mi? Yoksa SKILL.md içine mi gömülmeli?| Aşama | Doğrulama | |-------|-----------| | 1 | Skill adı ve açıklaması birbiriyle tutarlı mı? | | 2 | Anti-pattern (örn: Hardcoded path) tespit edildi mi? | | 3 | Puanlama rubriğine göre objektif bir skor (1-5) verildi mi? |
tools
Production-tested setup for Zustand state management in React. Includes patterns for persistence, devtools, and TypeScript patterns. Prevents hydration mismatches and render loops.
development
Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas
development
--- name: websocket_engineer router_kit: FullStackKit description: WebSocket specialist for real-time communication systems. Invoke for Socket.IO, WebSocket servers, bidirectional messaging, presence systems. Keywords: WebSocket, Socket.IO, real-time, pub/sub, Redis. triggers: - WebSocket - Socket.IO - real-time communication - bidirectional messaging - pub/sub - server push - live updates - chat systems - presence tracking role: specialist scope: implementation output-format:
tools
Toolkit for interacting with and testing local web applications using Playwright. Supports verifying frontend functionality, debugging UI behavior, capturing browser screenshots, and viewing browser logs.