/SKILL.md
Create S-tier portable skills with comprehensive quality guidance. This skill should be used when creating new skills, improving existing skills, auditing skill quality, or learning skill-building best practices. Covers the full lifecycle from use case definition through testing, distribution, and iteration. Do NOT use for general coding tasks, writing documentation unrelated to skills, or building applications.
npx skillsauth add bjg4/bgskillz bgskillzInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Build high-quality, portable Claude skills that trigger reliably and deliver real value.
references/, agent instructions in agents/, and link to them. Claude reads these when referenced explicitly.my-skill/
├── SKILL.md # Required. Main entry point. Contains frontmatter + instructions.
├── agents/ # Optional. Sub-agent instruction files for multi-agent workflows.
├── scripts/ # Optional. Executable helpers (Python, Bash, etc.)
├── references/ # Optional. Deep reference docs linked from SKILL.md.
└── assets/ # Optional. Templates, configs, examples bundled with the skill.
SKILL.md is the only required file. It must be exactly SKILL.md (not skill.md, not README.md).
Frontmatter fields:
name (required): kebab-case, no spaces, no capitals, no "claude" or "anthropic"description (required): Under 1024 chars. Determines when the skill triggers.license, metadata, compatibility: Optional but recommended for distribution.Follow these 7 steps to build a skill from scratch.
Before writing anything, articulate exactly who will use this skill and for what. Pick a category:
Write 2-3 specific use case sentences: "A developer wants to... so they can..."
Set measurable goals before building:
Problem-first skills tend to have better descriptions because the pain point is the trigger.
Decide what goes into each directory:
Rule of thumb: If it's >50 lines and not needed on every invocation, it belongs in references/. If it's instructions for a sub-agent, it belongs in agents/.
Run the scaffolder to create your skill directory:
python ~/.claude/skills/bgskillz/scripts/init_skill.py my-skill-name --path ~/target/directory
This creates a well-structured starting point with TODO prompts to guide you.
This is where quality is made or lost. Follow these rules:
Description (most critical field):
Use the formula: [What it does] + [When to use it] + [Key capabilities]
Good: "Generate production-ready database migrations from natural language descriptions. Use when adding tables, columns, indexes, or modifying schema. Handles rollbacks, data preservation, and index optimization."
Bad: "Helps with database stuff."
See references/description-crafting.md for 15+ examples and anti-patterns.
Naming rules:
my-cool-skill not MyCoolSkillname: field in frontmatterWriting instructions:
Error handling:
Security rules:
Run the packager to validate and create a distributable zip:
python ~/.claude/skills/bgskillz/scripts/package_skill.py /path/to/my-skill
This runs full validation, then creates a zip ready for upload or sharing. See references/distribution-guide.md for hosting and positioning guidance.
These are hard requirements. Violating them causes failures.
SKILL.md — Exact casing. Not skill.md, not Skill.md.README.md inside the skill folder — It confuses the system. README goes in your GitHub repo root, outside the skill folder.my-skill not my_skill or MySkill or my skill< or > characters anywhere in the YAML block.my-skill/, frontmatter name must be my-skill.Be specific and actionable. Every instruction should pass the "what would Claude actually do?" test. "Write good code" fails. "Use early returns to reduce nesting. Limit functions to 20 lines. Name variables descriptively." passes.
Progressive disclosure. Put the 20% of instructions that cover 80% of use cases in SKILL.md. Put the remaining detail in references. Link clearly: "For advanced configuration patterns, see references/workflow-patterns.md."
Reference bundled resources clearly. When pointing to a reference file, use the exact relative path. Claude will read the file when you reference it this way.
Include error handling. Tell Claude what to do when: the user's request is ambiguous, a required tool is missing, an API call fails, the output doesn't match expectations.
Consistent terminology. Pick one term and stick with it. Don't alternate between "skill", "plugin", and "extension" in the same document.
Default + escape hatch. "Generate TypeScript by default. If the user requests JavaScript or another language, adapt accordingly." This gives Claude a clear default while preserving flexibility.
Show, don't just tell. Include 1-2 examples of ideal output in your SKILL.md. Claude mimics examples more reliably than it follows abstract rules.
Look for repeated work. If you run tests and notice Claude independently writes similar boilerplate or setup code each time, bundle that code into the skill as a script or template. Don't make Claude reinvent the wheel on every invocation.
Keep the prompt lean. After each iteration, review the full SKILL.md and remove instructions that aren't pulling their weight. Read transcripts to identify instructions that Claude ignores or that cause unproductive behavior. A shorter, focused skill outperforms a comprehensive but bloated one.
Test your skill in three ways, from manual to fully automated:
Trigger testing — Does it activate when it should? Does it stay quiet when it shouldn't?
Functional testing — When triggered, does it produce correct output?
Baseline comparison — Is the skill actually better than Claude without it?
For rigorous testing, use the automated eval pipeline:
# Run evaluation with baseline comparison
python ~/.claude/skills/bgskillz/scripts/run_eval.py /path/to/skill --prompts tests/prompts.json
# Run automated improvement loop (eval -> grade -> analyze -> improve -> repeat)
python ~/.claude/skills/bgskillz/scripts/run_loop.py /path/to/skill --prompts tests/prompts.json --iterations 3 --auto-apply
# Optimize description triggering
python ~/.claude/skills/bgskillz/scripts/improve_description.py /path/to/skill
# Generate a self-contained HTML review page
python ~/.claude/skills/bgskillz/eval-viewer/generate_review.py /path/to/workspace/iteration-1/evals.json
The eval pipeline runs each test prompt through Claude with and without the skill, computing benchmark statistics (mean, stddev, min, max) and saving outputs for grading. Use the sub-agents in agents/ to grade outputs, blind-compare them, and analyze patterns:
agents/grader.md — Grades outputs against assertions with evidence and meta-evaluationagents/comparator.md — Blind A/B comparison (doesn't know which output is skill vs. baseline)agents/analyzer.md — Unblinded pattern analysis with prioritized improvement suggestionsThe run_loop.py script automates the full cycle: eval → grade → analyze → apply suggestions → re-eval. Use --auto-apply to let it modify SKILL.md between iterations (backups are saved).
Review results visually with eval-viewer/viewer.html, or generate a self-contained review page with eval-viewer/generate_review.py. See references/schemas.md for all data formats.
Undertriggering: Users have to explicitly invoke the skill; paraphrased requests don't activate it. Fix: Add more trigger phrases to the description. Be more specific about use cases.
Overtriggering: Skill activates on unrelated tasks. Fix: Add negative triggers. Narrow the description scope. Use more specific terminology.
Anti-overfitting warning: When iterating, read the actual transcripts. Look for unproductive behavior the skill causes and look for repeated work across test runs (if all runs independently write similar scripts, bundle that script into the skill). Resist adding fiddly constraints — generalize from the feedback instead.
For comprehensive testing methodology, read references/testing-methodology.md.
Skill won't upload: Check that the file is named exactly SKILL.md. Verify frontmatter is valid YAML with name and description. Ensure the name is kebab-case with no spaces.
Skill doesn't trigger: Your description likely doesn't match how users phrase requests. Add more specific trigger phrases. Include the exact verbs and nouns users would say.
Instructions not followed: SKILL.md may be too long or instructions are buried. Front-load critical rules. Use bold for must-follow constraints. Reduce total word count.
For all troubleshooting scenarios, read references/troubleshooting.md.
Quick pre-flight check before publishing:
SKILL.md exists with exact casingname (kebab-case) and description (under 1024 chars)[What] + [When] + [Capabilities] formulaFor the full audit rubric with scoring, read references/quality-checklist.md.
Choose what you need help with:
"I want to create a new skill" — Walk through the 7-step creation workflow. Start by defining use cases and end with a packaged, validated skill.
"Audit my skill" or "Review this skill" — Run the full quality checklist against an existing skill. Identify issues and suggest fixes.
"Help me write a better description" — Apply the description formula and test trigger phrases. Rewrite for maximum activation reliability.
"Add a script/reference/asset to my skill" — Help plan and implement a new component (validator, reference doc, template, etc.) for an existing skill.
"Validate my skill" — Run the validation script to check structural correctness:
python ~/.claude/skills/bgskillz/scripts/validate_skill.py /path/to/skill
"Package my skill" — Run validation + create a distributable zip:
python ~/.claude/skills/bgskillz/scripts/package_skill.py /path/to/skill
"Evaluate my skill" or "Run evals" — Run the automated evaluation pipeline with baseline comparison:
python ~/.claude/skills/bgskillz/scripts/run_eval.py /path/to/skill --prompts tests/prompts.json
Then grade and analyze results using the agents in agents/. Review visually with eval-viewer/viewer.html or generate a self-contained review page with eval-viewer/generate_review.py.
"Iterate on my skill" or "Auto-improve my skill" — Run the full automated cycle (eval → grade → analyze → improve → re-eval):
python ~/.claude/skills/bgskillz/scripts/run_loop.py /path/to/skill --prompts tests/prompts.json --iterations 3 --auto-apply
"Improve my skill's triggering" — Run the description optimization pipeline:
python ~/.claude/skills/bgskillz/scripts/improve_description.py /path/to/skill
"How do I..." — Answer questions about skill building using the reference library. Topics: descriptions, workflows, testing, evaluation, troubleshooting, distribution, quality.
development
Create S-tier portable skills and agents with comprehensive quality guidance. Use when creating or improving skills, designing multi-agent orchestration workflows, auditing agent quality, running evals, or learning agent-building best practices. Covers the full lifecycle from use case definition through automated evaluation, review, and iteration. Do NOT use for general coding tasks, writing documentation unrelated to agents/skills, or building applications.
development
Create S-tier portable skills and agents with comprehensive quality guidance. Use when creating or improving skills, designing multi-agent orchestration workflows, auditing agent quality, running evals, or learning agent-building best practices. Covers the full lifecycle from use case definition through automated evaluation, review, and iteration. Do NOT use for general coding tasks, writing documentation unrelated to agents/skills, or building applications.
development
Create S-tier portable skills and agents with comprehensive quality guidance. Use when creating or improving skills, designing multi-agent orchestration workflows, auditing agent quality, running evals, or learning agent-building best practices. Covers the full lifecycle from use case definition through automated evaluation, review, and iteration. Do NOT use for general coding tasks, writing documentation unrelated to agents/skills, or building applications.
development
Stress-test a plan or design through Socratic questioning until shared understanding. Use when the user says "grill me", wants to stress-test a plan, or review a design. Do NOT use for implementation or code generation.