bootstrap-skills/dhub-skill-creator/SKILL.md
Guide for creating effective skills for Claude Code agents. Covers skill design, implementation, validation, packaging, and optionally runtime environments and automated evaluations for Decision Hub publishing. Use when users want to create, improve, or package a skill.
npx skillsauth add pymc-labs/decision-hub dhub-skill-creatorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Create modular skill packages (SKILL.md + optional resources) that turn Claude into a specialist. This skill guides the full lifecycle: define the domain, design the architecture, build and validate the skill, and package it for distribution.
For skills intended for Decision Hub, the workflow naturally extends into defining runtime environments and writing evaluation criteria — not as a separate mode, but as a natural consequence of what the skill needs.
Progressive disclosure — each layer loads only when needed:
name + description in frontmatter. Determines when the skill activates.scripts/ — deterministic code the agent executes via Bashreferences/ — domain knowledge the agent reads on demandassets/ — templates, sample data, output formatsagents/ — subagent system prompts for delegation| Pattern | Why It Works | Example | |---------|-------------|---------| | Architecture diagram up front | Agent grasps the big picture before details | ASCII flow showing phase transitions | | Review gates | Prevents runaway execution, gives user control points | "HARD STOP — present outline, wait for approval" | | Subagent delegation | Separates concerns, each agent does one thing well | Actor-critic loop: generate → critique → revise | | Anti-patterns / blacklists | Tells agent what NOT to do — as important as what to do | List of cliches to never use | | Quality checklists | Actionable verification before output | Design system checklist with checkboxes | | Sensible defaults | Reduces friction — ask only what's needed | Default assumptions table at skill start | | Concrete examples | Shows expected behavior, not just rules | Good/bad output snippets inline |
references/Four phases. Not rigid steps — they overlap and the depth of each depends on the skill's complexity.
Understand what the skill does before building anything. Ask focused questions:
description field. Think about what a user would say.Ask 2-3 focused questions. Never more than 5. Gather enough to make design decisions, then move on.
Choose the structural pattern and identify resources. Read references/skill_patterns.md for detailed patterns.
Choose a structural pattern:
Identify bundled resources:
scripts/?references/?agents/?assets/?Determine if the skill needs runtime or evals blocks. These are about the skill's nature — whether it has executable code or should be automatically testable — not about where the skill will be published.
runtime block in frontmatter. See references/format_spec.md for field details.Scaffold. Run scripts/init_skill.py to create the directory:
python internal-skills/dhub-skill-creator/scripts/init_skill.py <name> --path <dir> [--with-runtime] [--with-evals] [--description "..."]
Write the SKILL.md body. Follow the writing guidelines below. The body is the agent system prompt — procedural knowledge the agent cannot infer on its own.
Build resources. Create scripts, references, assets, agents as designed. For runtime skills, ensure the entrypoint exists and dependencies are declared.
Validate early. Run scripts/validate_skill.py during development, not just at the end.
When the skill has an evals block, help the user construct eval cases through a structured interview.
Step 1 — Identify what to test. Each eval case tests one specific behavior.
Step 2 — Write the eval prompt. A realistic user message — what a real person would say to trigger this skill. Keep it focused. Include test data in evals/data/ if needed.
Step 3 — Compose judge criteria. The judge_criteria field is free-text interpreted by an LLM judge. Build it from structured blocks — pick whichever are relevant:
Required Behaviors — things the agent MUST do:
## Required Behaviors
- Checks data distribution before selecting a statistical test
- Reports confidence intervals, not just p-values
Forbidden Behaviors — things that cause automatic failure:
## Forbidden Behaviors
- Applies parametric tests without verifying normality
- Hallucinates data that wasn't in the input file
Expected Output Contains — specific patterns or concepts:
## Expected Output Contains
- A test statistic and p-value
- An interpretation in plain language
Calibration Examples — good/bad snippets so the judge knows what "right" looks like:
## Examples
Good: "Shapiro-Wilk test (p=0.003) rejects normality, using Mann-Whitney U..."
Bad: "Running a t-test gives p=0.04, so the treatment works."
Threshold — how to combine criteria into pass/fail:
## Scoring
PASS if all Required Behaviors present AND no Forbidden Behaviors appear.
Interview the user to populate these blocks:
For simple cases, a single sentence works: "PASS if the agent creates a valid CSV file with headers matching the schema. FAIL otherwise."
Step 4 — Assemble the eval YAML. Create evals/<case-name>.yaml with name, description, prompt, and judge_criteria fields. See references/format_spec.md for the complete spec and references/skill_patterns.md for the eval criteria authoring guide.
references/.Run validation during development to catch issues early:
python internal-skills/dhub-skill-creator/scripts/validate_skill.py <skill-dir>
python internal-skills/dhub-skill-creator/scripts/validate_skill.py <skill-dir> --strict
Fix all errors before packaging. Address warnings to improve quality.
Create a distributable zip (runs validation first):
python internal-skills/dhub-skill-creator/scripts/package_skill.py <skill-dir> [--output-dir <dir>]
Test the skill by using it on real tasks. Notice gaps, iterate on the SKILL.md and resources. Skills improve through use, not through planning.
After validation and packaging, ask the user: "Do you want to publish this skill to Decision Hub?" If yes:
dhub publish --org <org> --name <skill>
Run from the skill directory. The server validates the manifest, runs safety checks, and optionally triggers eval runs.
| Script | Purpose | Usage |
|--------|---------|-------|
| init_skill.py | Scaffold a new skill | python scripts/init_skill.py <name> --path <dir> [--with-runtime] [--with-evals] [--description "..."] |
| validate_skill.py | Validate a skill directory | python scripts/validate_skill.py <skill-dir> [--strict] |
| package_skill.py | Validate + zip for distribution | python scripts/package_skill.py <skill-dir> [--output-dir <dir>] |
| Field | Required | Description |
|-------|----------|-------------|
| name | yes | 1-64 chars, ^[a-z0-9]([a-z0-9-]{0,62}[a-z0-9])?$ |
| description | yes | 1-1024 chars, what the skill does + when to trigger |
| license | no | SPDX identifier |
| compatibility | no | Requirements or constraints |
| metadata | no | Key-value pairs |
| allowed_tools | no | Tool access restrictions |
| runtime | no | Executable code configuration (see references/format_spec.md) |
| evals | no | Automated evaluation configuration (see references/format_spec.md) |
--strict promotes all to errors"SKILL.md not found" — Ensure you point to the skill directory, not the SKILL.md file itself.
"name does not match directory name" — The name field in frontmatter must exactly match the containing directory name. Rename either one.
"Frontmatter is not valid YAML" — Check for unquoted colons in field values. Wrap the description in quotes if it contains colons: description: "My skill: does things".
"entrypoint does not exist" — The file at runtime.entrypoint must exist relative to the skill root. Create the file or fix the path.
"No evals/*.yaml files found" — Either add eval case YAML files to the evals/ directory, or remove the evals block from frontmatter if evals aren't needed yet.
Validation passes but skill doesn't trigger — The description may be too vague. Make it specific with concrete task types and "Use when..." phrasing.
Zip excludes needed files — The packager excludes __pycache__/, *.pyc, .DS_Store, .git/, *.egg-info/, .env*. If a needed file matches these patterns, rename it.
tools
A skill
tools
A skill
tools
Guide for using the dhub CLI — the AI skill manager for data science agents. Covers authentication, publishing, installing, running skills, managing API keys, eval reports, and troubleshooting. Use when users ask about dhub commands, skill publishing workflows, or need help with the Decision Hub CLI.
tools
A skill