engineering/skills/skill-tester/SKILL.md
Validate, test, and score the quality of skills within the claude-skills ecosystem. Comprehensive meta-skill: structure validation, Python script testing (syntax + imports + runtime + output format), multi-dimensional quality scoring with letter grades and tier classification (BASIC/STANDARD/POWERFUL). Use when authoring a new skill, auditing existing skills for tier promotion, setting up pre-commit hooks for skill quality, or integrating skill QA into CI.
npx skillsauth add alirezarezvani/claude-skills skill-testerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Tier: POWERFUL · Category: Engineering Quality Assurance · Dependencies: None (Python stdlib only)
Meta-skill that validates, tests, and scores skills in this repository. Four tools, run from the repo root with full paths:
scripts/skill_validator.py — structure + documentation compliancescripts/script_tester.py — Python script syntax/imports/runtime/output testingscripts/quality_scorer.py — multi-dimensional scoring with letter gradescripts/security_scorer.py — security posture scoring (also available via quality_scorer.py --include-security)Scope note: this skill's tier line-count minimums measure legacy skills. For authoring new skills,
engineering/write-a-skill(SKILL.md under ~100 lines, Matt Pocock doctrine) is the binding standard — do not pad a new skill to satisfy a tier minimum here.
# 1. Validate structure (exit non-zero on failure — usable as a gate)
python3 engineering/skills/skill-tester/scripts/skill_validator.py engineering/skills/self-eval --json
# 2. Test the skill's Python scripts (30s default timeout per script)
python3 engineering/skills/skill-tester/scripts/script_tester.py engineering/skills/self-eval --json
# 3. Score quality (fail CI below threshold with --minimum-score)
python3 engineering/skills/skill-tester/scripts/quality_scorer.py engineering/skills/self-eval --json --detailed --minimum-score 75
Consume the JSON: validator emits overall_score, compliance_level, per-check checks{}; scorer emits overall_score, letter_grade, tier_recommendation, dimensions, and an improvement_roadmap — work the roadmap top-down, then re-run until the target score is met.
For repo-wide auditing prefer scripts/audit_skills.py at the repo root (wraps the write-a-skill checklist runner across all skills).
--tier BASIC|STANDARD|POWERFUL)--timeout, default 30s)--help functionality verification; sample-data runs compared against expected_outputs/Four dimensions, 25% each: Documentation (depth, examples, references), Code Quality (complexity, error handling, output consistency), Completeness (required dirs, sample data, expected outputs), Usability (help text, example clarity). Outputs 0-100 + A-F grade + tier recommendation.
| Tier | SKILL.md | Scripts | CLI surface | |---|---|---|---| | BASIC | ≥ 100 lines | 1 (100-300 LOC) | basic argparse | | STANDARD | ≥ 200 lines | 1-2 (300-500 LOC) | subcommands, JSON + text output | | POWERFUL | ≥ 300 lines | 2-3 (500-800 LOC) | multiple modes, CI integration |
(Advisory for legacy skills; new skills follow write-a-skill — see scope note above.)
# GitHub Actions: gate changed skills
- name: "validate-changed-skills"
run: |
for skill in $changed_skills; do
python3 engineering/skills/skill-tester/scripts/skill_validator.py "$skill" --json
python3 engineering/skills/skill-tester/scripts/script_tester.py "$skill"
python3 engineering/skills/skill-tester/scripts/quality_scorer.py "$skill" --minimum-score 75
done
Pre-commit hook: run the validator on the staged skill directory and block the commit on non-zero exit.
A skill "passes" when, in one run from repo root:
skill_validator.py <skill> --json exits 0,script_tester.py <skill> reports all scripts passing, andquality_scorer.py <skill> --minimum-score <target> exits 0.If any step fails, apply the top improvement_roadmap item and re-run all three — never report a partial pass.
--timeout or optimize the script under testReferences: references/ holds the structure specification, tier requirements matrix, and scoring rubric the tools implement.
data-ai
Use when you want to understand what Claude contributed vs what you drove in a session. Triggers on: /collab-proof, session retrospective, ai contribution analysis, collaboration evidence, what did claude do.
data-ai
Personal coach that teaches users to become Claude power users. Use this skill the FIRST time a user asks to "learn Claude", "be a power user", "coach me", "teach me Claude tricks", "what can Claude do", "make me better at prompting", or any variation. After activation, also use it on EVERY subsequent turn to detect missed optimization opportunities (vague prompts, ignored capabilities, manual work Claude could automate) and surface a single power-user tip. Trigger generously — most users do not know what they do not know, so err on the side of coaching.
development
Use when designing or revisiting product pricing — selecting a pricing model (subscription seat-based, usage-based, value-based, freemium, or hybrid), running Van Westendorp Price Sensitivity Meter analysis on WTP survey data, or designing Good/Better/Best packaging tiers. Recommends a model and a price range with trade-offs, never a single number. For Commercial leads, Product Marketing, and CMOs at the pricing-design moment — not deal-by-deal discounting, not brand positioning.
testing
Use when a startup is approached by a prospective partner and someone has to decide should we sign this partner, at what partner tier (referral / reseller / OEM / SI-consulting / strategic alliance), with what joint GTM commitment, and at what revshare. Classifies partner tier from independent-demand evidence vs. preferential-terms hunting, designs a 90-day joint GTM plan, models revshare against direct-sale margin, and surfaces kill criteria for unwinding under-performing partnerships. For Head of Partnerships, Head of BD, and Founder-CEOs doing reseller agreement, OEM deal, or strategic alliance review — not technical sale enablement, not channel cost economics, not M&A.