plugins/abstract/skills/skills-eval/SKILL.md
Evaluate Claude skill quality through auditing. Use when reviewing or auditing skills.
npx skillsauth add athola/claude-night-market skills-evalInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This framework audits Claude skills against quality standards to improve performance and reduce token consumption. Automated tools analyze skill structure, measure context usage, and identify specific technical improvements. Run verification commands after each audit to confirm fixes work correctly.
The skills-auditor provides structural analysis, while the improvement-suggester ranks fixes by impact. Compliance is verified through the compliance-checker. Runtime efficiency is monitored by tool-performance-analyzer and token-usage-tracker.
Run a full audit of all skills or target a specific file to identify structural issues.
# Audit all skills
make audit-all
# Audit specific skill
make audit-skill TARGET=path/to/skill/SKILL.md
Use skill_analyzer.py for complexity checks and token_estimator.py to verify the context budget.
make analyze-skill TARGET=path/to/skill/SKILL.md
make estimate-tokens TARGET=path/to/skill/SKILL.md
Generate a prioritized plan and verify standards compliance using improvement_suggester.py and compliance_checker.py.
make improve-skill TARGET=path/to/skill/SKILL.md
make check-compliance TARGET=path/to/skill/SKILL.md
Start with make audit-all to inventory skills and identify high-priority targets. For each skill requiring attention, run analysis with analyze-skill to map complexity. Generate an improvement plan, apply fixes, and run check-compliance to verify the skill meets project standards. Finalize by checking the token budget for efficiency.
Quality assessments use the skills-auditor and improvement-suggester to generate detailed reports. Performance analysis focuses on token efficiency through the token-usage-tracker and tool performance via tool-performance-analyzer. For standards compliance, the compliance-checker automates common fixes for structural issues.
We evaluate skills across five dimensions: structure compliance, content quality, token efficiency, activation reliability, and tool integration. Scores above 90 represent production-ready skills, while scores below 50 indicate critical issues requiring immediate attention.
Improvements are prioritized by impact. Critical issues include security vulnerabilities or broken functionality. High-priority items cover structural flaws that hinder discoverability. Medium and low priorities focus on best practices and minor optimizations.
Deprecated: skills/shared/modules/ directories. Shared modules must be relocated into the consuming skill's own modules/ directory. The evaluator flags any remaining skills/shared/ as a structural warning.
Current: Each skill owns its modules at skills/<skill-name>/modules/. Cross-skill references use relative paths (e.g., ../skill-authoring/modules/anti-rationalization.md).
modules/trigger-isolation-analysis.mdmodules/authoring-checklist.mdmodules/evaluation-workflows.mdmodules/advanced-tool-use-analysis.mdmodules/evaluation-framework.mdmodules/integration.mdmodules/troubleshooting.mdmodules/pressure-testing.mdmodules/integration-testing.mdmodules/performance-benchmarking.mdscripts/ directory.scripts/automation/.research
Generate diverse solution candidates with category-spanning ideation methods and rotation. Use when stuck on a design or fighting repetitive LLM output.
tools
--- name: validate-pr description: Use when you need a diff-derived test plan for a PR: reads the diff, groups changes by area, runs targeted verifications, and proves revert-tests are genuine guards, not dead assertions. alwaysApply: false category: validation tags: - pr - validation - test-plan - diff - revert-test - evidence tools: [] usage_patterns: - diff-derived-test-plan - revert-test-quality-check - evidence-capture complexity: intermediate model_hint: standard estimated_tokens: 650
development
Contract for the project decision journal (tradeoffs and lessons-learned logs). Use when recording a decision, tradeoff, or lesson, or building a consumer hook.
development
Ramps implementation ambition a notch only after the prior increment is understood. Use when building a feature you must understand, not just ship.