skills/generate-skill/SKILL.md
Interactive SKILL.md builder. Use whenever the user asks to "create a skill", "generate a skill", "scaffold a SKILL.md", "write a SKILL.md", or "turn this workflow into a skill". Do NOT use for grading existing skills (see rate-skill).
npx skillsauth add antjanus/skillbox generate-skillInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Produces a single, ready-to-ship SKILL.md (plus optional references/, scripts/, assets/) following 2026-05 conventions distilled from Anthropic's skill-creator, the agentskills.io spec, and empirical activation research.
One skill, one job. This skill's job is the SKILL.md and its frontmatter — not docs, releases, or auxiliary files.
Use AskUserQuestion to ask one question at a time. Never bulk-dump the discovery questionnaire. After each phase, show the artifact and confirm before moving on.
Ask these in order, one per turn:
methodology — enforces a multi-step workflow (e.g., code-review, track-session)technical — wraps an API, format, or tool (e.g., docx, pdf, semantic-release)auditing — grades or inspects an artifact (e.g., rate-skill, security-review)reference — domain schemas, conventions, lookup tables (e.g., bigquery)automation — wraps a script or external command (e.g., screenshot-local)Do NOT use… clause.Compose a single-line YAML string, target ≤230 chars (soft cap — listing-budget safe; no penalty up to 500), shape:
<Third-person noun phrase>. Use whenever the user <wants/asks to> <trigger 1>, "<trigger 2>", or <trigger 3>. <Optional scope note.> Do NOT use this skill for <near-neighbor 1> — see <other-skill>.
Show the draft, count chars (python3 -c "import yaml; print(len(yaml.safe_load(open('SKILL.md').read().split('---',2)[1])['description']))"), iterate.
Quality signals:
| or > block scalars (silently breaks discovery, anthropics/skills #9817)Do NOT use this skill for … clause when a near-neighbor existsAnti-patterns:
description: | block — silent discovery failureEmit only these fields, in this order:
---
name: <kebab-case, ≤64 chars, no "anthropic" or "claude" reserved words>
description: <single-line string from Phase 2>
license: MIT
argument-hint: [<short-token>] # top-level, not nested
metadata:
author: <user-supplied>
version: "1.0.0"
---
Add these optional fields only when the skill genuinely needs them:
| Field | When to add |
|---|---|
| allowed-tools | Skill must restrict which tools the agent uses (Experimental) |
| compatibility | Cross-agent (Cursor, Cline, etc.) compatibility note, ≤500 chars |
| disable-model-invocation: true | Skill is slash-only, must not auto-trigger |
| user-invocable: false | Skill is only callable by subagents or other skills |
| model / effort | Skill needs a specific model tier or thinking budget |
| paths | Skill should only auto-trigger inside specific repo paths |
Anti-patterns:
version, author, tags, category, hooks — produce "unexpected key" errors (anthropics/skills #37). They live under metadata.argument-hint under metadata — Claude Code reads it at top level.anthropic or claude.name.Pick the template that matches the skill type. Full templates live in references/PATTERNS.md — load only when generating the body.
Compact summary:
references/API.md.references/<domain>.md per domain. SKILL.md stays a router; don't inline the data.scripts/.Always include a ## Gotchas section — Anthropic engineers cite it as the highest-signal section in a skill body.
Length budget: aim <300 lines in SKILL.md, hard cap 500. The ETH Zurich AGENTS.md study (arXiv 2602.11988) found verbose context files reduce task success ~3% and inflate step count >20%.
Produce 2–3 ✅ desired examples and 1 ❌ counter-example. Show ✅ first; if room, end on ✅ (recency bias).
Format:
### Example: <one-line task>
✅ Desired
<short code or transcript>
Why it works: <one sentence>.
### Counter-example
❌ Anti-pattern
<short code or transcript>
Why it fails: <one sentence>.
Do not use <Good> / <Bad> XML tags — they appear in zero of the 8 surveyed top community skills.
Concrete failure modes specific to this skill — not generic skill-authoring advice. Each entry: one-line symptom, one-line cause, one-line fix.
Pair every "do not X" with a positive directive — negation handling in LLMs is empirically weak (arXiv 2503.22395).
Count lines in the body. If SKILL.md exceeds 300 lines:
references/<TOPIC>.md files (plural directory — reference/ singular is a non-canonical anti-pattern).Anthropic's skill-creator now ships an A/B description optimizer with 60/40 train/test eval sets. Generate-skill emits an eval set for every skill before completing.
Generate 20 candidate queries:
Save as references/EVAL.md. Ask the user to spot-check 3–5 in a fresh Claude session before finalizing. If activation is unreliable, return to Phase 2 and rework the description.
<skill-name>/
├── SKILL.md # <300 lines, one job
├── references/ # plural — load on demand
│ ├── PATTERNS.md # optional: body templates by type
│ └── EVAL.md # 20-query eval set (Phase 8)
├── scripts/ # optional: deterministic helpers
└── assets/ # optional: templates copied into output
Worked description examples (methodology ✅, technical ✅, multi-line counter-example ❌): references/EXAMPLES.md.
description./skills. Cause: Multi-line description: |. Fix: Collapse to a single line; move long context into the body's first paragraph.version, author, tags, or hooks. Fix: Move them under metadata (except argument-hint, which stays top-level).Do NOT use this skill for X — see Y to whichever skill is the wrong fit.references/ (plural), one level deep, with explicit "load when…" pointers.generate-skill produces; rate-skill audits.development
EXPERIMENTAL. Mine recent Claude Code transcripts for friction events, cluster them by active skill, propose patches for skills with 3+ friction events, validate each patch via headless replay, scrub the report through /publish-check, and present an EVOLUTION_REPORT.md for human review on a branch (never auto-merge). Use when asked to "evolve my skills", "audit skills against recent friction", "propose skill improvements from transcripts", "run the skill evolution pipeline", or as part of a weekly skill-quality cadence.
testing
Manual QA tracking — things tests can't verify. Use when asked to "create a QA list", "set up QA for this project", "what should I QA", "track manual QA", "audit the QA list", or "start manual QA".
development
Multi-source web research with cited synthesis in chat. Use when asked to "research X", "deep research on Y", "deep dive on Z", "investigate this topic", "compare X and Y", "pros and cons of X", or "survey the landscape of Y".
development
Use this skill whenever the user wants a multi-agent review of local changes — triggers include "review my code", "review these changes", "do a code review", or "check my changes before I commit". Writes REVIEW.md. Do NOT use for an open PR by number (use /review) or a security-specific pass (use /security-review).