skills/writing-skills/SKILL.md
Use when creating new skills, editing existing skills, or verifying skills work before deployment. Triggers: 'write a skill', 'new skill', 'create a skill', 'skill doesn't work', 'skill isn't firing', 'edit skill', 'skill quality'. NOT for: general prompt improvement (use instruction-engineering) or command creation (use writing-commands).
npx skillsauth add axiomantic/spellbook writing-skillsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
sonnet, flash, pro) over hardcoded version numbers (e.g. claude-3-5-sonnet, gemini-2.5-flash). Hardcoded versions are permitted ONLY when a specific behavior is required that differs between versions.| Input | Required | Description |
|-------|----------|-------------|
| Skill purpose | Yes | What behavior the skill should instill or technique it should teach |
| Failing scenario | Yes | Documented agent behavior WITHOUT the skill (RED phase) |
| Target location | No | skills/<name>/SKILL.md path; defaults to inferring from purpose |
| Output | Type | Description | |--------|------|-------------| | SKILL.md | File | Schema-compliant skill at target location | | Baseline documentation | Inline | Record of agent behavior before skill (RED phase) | | Verification result | Inline | Confirmation skill changes behavior (GREEN phase) |
| Type | Purpose | Test Approach | Examples | |------|---------|---------------|----------| | Discipline | Enforces rules/requirements | Pressure scenarios, rationalizations | TDD, verify command | | Technique | Concrete steps to follow | Application + edge cases | condition-based-waiting, root-cause-tracing | | Pattern | Mental model for problems | Recognition + counter-examples | flatten-with-flags | | Reference | API docs, guides | Retrieval + gap testing | office docs, library guides |
skills/<name>/
SKILL.md # Required. Main content inline
supporting-file.* # Only for heavy reference (100+ lines) or reusable tools
Frontmatter (YAML only):
---
name: skill-name-with-hyphens # letters, numbers, hyphens only
description: Use when [triggering conditions and symptoms only, NEVER workflow]
---
Required sections:
# Skill Name
## Overview
What is this? Core principle in 1-2 sentences.
## When to Use
- Bullet list with SYMPTOMS and use cases
- When NOT to use
[Small inline flowchart IF decision non-obvious]
## Core Pattern (for techniques/patterns)
Before/after code comparison
## Quick Reference
Table or bullets for scanning common operations
## Implementation
Inline code for simple patterns
Link to file for heavy reference
## Common Mistakes
What goes wrong + fixes
| Asset | Pattern | Examples | |-------|---------|----------| | Skill | Gerund (-ing) or noun-phrase | debugging, test-driven-development, develop | | Command | Imperative verb(-noun) | execute-plan, verify, handoff, audit-green-mirage | | Agent | Noun-role | code-reviewer, fact-checker |
Name by what you DO, not generic category: root-cause-tracing > debugging-techniques, using-skills not skill-usage.
The description field is the most important line in your skill. At startup, only the name and description from every skill are pre-loaded into the system prompt. The skill body is NOT loaded until the agent decides it's relevant. That decision is made entirely from the description.
This makes the description an instruction engineering problem, not just a search optimization problem. You are writing a micro-prompt that tells the agent: "when you see these conditions, load me." Every token in the description competes with 100+ other skill descriptions for the agent's attention. A vague description drowns in the noise. A precise one acts as a reliable classifier.
The mental model: Think of descriptions as if conditions in the agent's routing logic. The agent scans all descriptions on every user message and asks "does this apply right now?" Your description must answer that question unambiguously in one pass.
Spread discoverable terms across the description to maximize match surface:
Workflow leakage (agent skips body because description contains the plan):
# BAD: Leaks internal structure. Agent reads "5-phase process: strategic planning,
# context analysis, deep review, verification, report generation" and executes
# that plan directly without loading the skill.
description: >
Use when performing thorough multi-phase code review. 5-phase process:
strategic planning, context analysis, deep review, verification, report
generation. More heavyweight than code-review.
# GOOD: Triggers only. Agent knows WHEN to load but must read body to learn HOW.
description: >
Use when performing thorough code review with historical context tracking.
Triggers: 'thorough review', 'deep review', 'review this branch in detail',
'full code review with report'. More heavyweight than code-review; for quick
review, use code-review instead.
Abstract situation vs. user phrasings:
# BAD: No trigger phrases. Matches "debug" but misses "this is broken",
# "getting an error", "why isn't this working" - what users actually say.
description: Use when debugging bugs or unexpected behavior
# GOOD: Packed with the exact words users type.
description: >
Use when debugging bugs, test failures, or unexpected behavior. Triggers:
'why isn't this working', 'this doesn't work', 'X is broken', 'getting an
error', 'stopped working', 'regression', 'crash', 'flaky test', or when
user pastes a stack trace. NOT for: test quality issues (use fixing-tests),
adding new behavior (use develop).
Too thin (no trigger surface):
# BAD: Only matches if user literally says "execute implementation plan."
# Misses "run the plan", "start building", "implement the tasks", "next step."
description: Use when you have a written implementation plan to execute
# GOOD: Covers the ways this situation actually arises.
description: >
Use when you have an implementation plan ready to execute. Triggers: 'run
the plan', 'start building from the plan', 'execute tasks', 'implement
the steps', 'next task'. Also invoked by develop after planning phase.
Pure workflow summary with zero triggers:
# BAD: Describes what the skill does internally. Zero user-facing phrases.
# An agent will never match this to a user saying "how are my skills performing?"
description: >
Analyze session transcripts to extract skill invocation patterns, score
invocations, and produce comparative metrics for skill improvement decisions.
# GOOD: Leads with when/why, includes what users actually ask.
description: >
Use when evaluating skill effectiveness or comparing skill versions.
Triggers: 'how are skills performing', 'skill metrics', 'which skills fire
correctly', 'skill invocation analysis', 'compare skill versions'.
verifying-hunches -- Natural speech triggers:
"Use when about to claim discovery during debugging. Triggers: 'I found', 'this is the issue',
'root cause', 'smoking gun', 'aha', 'got it', 'the fix is', 'should fix', 'this will fix'.
Also invoked by debugging before any root cause claim."
Covers excited discovery ("aha!", "got it") and confident claims ("root cause", "the fix is"). Notes auto-invocation path. Narrow enough to avoid false positives.
develop -- Scope with anti-triggers:
"Use when building, creating, or adding functionality. Triggers: 'implement X', 'build Y',
'add feature Z', 'Would be great to...', 'I want to...', 'We need...'.
NOT for: bug fixes, pure research, or questions about existing code."
Covers direct commands ("implement X") and wish-phrasing ("Would be great to..."). "NOT for" section prevents false matches on debugging or research.
isolated-testing -- Behavioral triggers:
"Use when testing theories during debugging, or when chaos is detected.
Triggers: 'let me try', 'maybe if I', 'quick test', rapid context switching,
multiple changes without isolation."
Includes behavioral patterns (rapid context switching, multiple changes without isolation) detectable in the agent's own actions, not just user text.
| Anti-Pattern | Example | Problem | |-------------|---------|---------| | Workflow leakage | "3-phase process: plan, execute, verify" | Agent extracts a plan from description and skips loading the skill body entirely. | | Abstract-only | "Use when reviewing code" | No trigger phrases. "Review code" matches but "look at my changes" doesn't. | | Jargon-first | "Use when roundtable returns ITERATE verdict" | Users never say this. Internal workflow trigger with no user-facing phrases. | | Too broad | "Use when writing or modifying code" | Matches every coding task. Will fire constantly as a false positive. | | Too narrow | "Use before design phase only" | Limits to one temporal moment when the skill is useful at many stages. | | Implementation detail | "Modes: --self, --feedback, --give" | Users don't think in flags. They think "review my code" or "I got feedback." | | Missing disambiguation | "Use for code review" | Which of the 4 review skills? No "NOT for" or "instead use" guidance. |
reflexion): Description should state "Invoked by [system/skill], not directly by users"tarot-mode): Description should cover both the system trigger AND user-facing triggersdebugging): Description should focus entirely on user phrasingsWhen multiple skills overlap, each description MUST state why THIS skill and when to use a different one instead:
Example:
code-review: "For focused single-pass review. NOT for: heavy multi-phase analysis (use advanced-code-review) or PR triage (use distilling-prs)."advanced-code-review: "For thorough 5-phase analysis with historical context. NOT for: quick review (use code-review) or PR categorization (use distilling-prs)."distilling-prs: "For triaging and categorizing PR changes. NOT for: deep code analysis (use advanced-code-review)."NO SKILL WITHOUT FAILING TEST FIRST
| Pattern | When | Example |
|---------|------|---------|
| Self-contained | All content fits | defense-in-depth/SKILL.md |
| With tool | Reusable code needed | condition-based-waiting/SKILL.md + example.ts |
| Heavy reference | Reference 100+ lines | pptx/SKILL.md + pptxgenjs.md + ooxml.md |
One excellent example beats many mediocre ones. Choose most relevant language: testing techniques (TypeScript/JavaScript), system debugging (Shell/Python), data processing (Python).
Good example: Complete, runnable, well-commented explaining WHY, from real scenario, ready to adapt.
Don't: Implement in 5+ languages, create fill-in-the-blank templates, write contrived examples.
Use ONLY for:
Never use for: Reference material (use tables), Code examples (use markdown), Linear instructions (use numbered lists).
| Pattern | Why Bad | |---------|---------| | Narrative ("In session 2025-10-03, we found...") | Too specific, not reusable | | Multi-language dilution | Mediocre quality, maintenance burden | | Code in flowcharts | Can't copy-paste, hard to read | | Generic labels (helper1, step3) | Labels need semantic meaning |
Optimize for this flow - searchable terms early and often.
<FORBIDDEN> - Writing skill without documenting baseline failure first (RED phase skipped) - Summarizing workflow in description (causes agents to skip body) - Multiple examples when one excellent example suffices - Deploying without verification run (GREEN phase skipped) - Ignoring new rationalizations discovered during testing - Creating multiple skills in batch without testing each - Keeping untested changes as "reference" - Using `@` links that force-load and burn context - Generic labels without semantic meaning - Narrative storytelling about specific sessions </FORBIDDEN>| Phase Count | Requirement | |-------------|-------------| | 1 phase | Exempt. Self-contained SKILL.md is fine. | | 2 phases | SHOULD separate into orchestrator + commands. | | 3+ phases | MUST separate. Orchestrator dispatches, commands implement. |
The Core Rule: The orchestrator dispatches subagents (Task tool). Subagents invoke phase commands (Skill tool). The orchestrator NEVER invokes phase commands directly into its own context.
Content Split:
| Orchestrator SKILL.md | Phase Commands | |----------------------|----------------| | Phase sequence and transitions | All phase implementation logic | | Dispatch templates per phase | Scoring formulas and rubrics | | Shared data structures (referenced by 2+ phases) | Discovery wizards and prompts | | Quality gate thresholds | Detailed checklists and protocols | | Anti-patterns / FORBIDDEN section | Review and verification steps |
Data structure placement: If referenced by 2+ phases, define in orchestrator. If referenced by 1 phase only, define in that phase's command.
Soft target: ~300 lines for orchestrator SKILL.md. The hard rule is about content types: orchestrators contain coordination logic, never implementation logic.
Exceptions:
Canonical example: develop uses 5 commands across 6+ phases. The orchestrator defines phase sequence, dispatch templates, and shared data structures. Each phase command (discover, design, execute-plan, etc.) contains its own implementation logic.
Anti-Patterns:
| Anti-Pattern | Why It Fails | |-------------|-------------| | Orchestrator invokes Skill tool for a phase command | Loads phase logic into orchestrator context, defeating separation | | Orchestrator embeds phase logic directly | Monolithic file; orchestrator context bloats with implementation detail | | Subagent prompt duplicates command instructions | Drift between prompt and command; maintenance burden doubles | | Monolithic SKILL.md exceeding 500 lines with phase implementation | Signal that phase logic should be extracted to commands |
For skills producing evaluative output (verdicts, findings, scores, pass/fail):
/design-assessment with the target type being evaluatedExample skills: code-review, auditing-green-mirage, fact-checking, reviewing-design-docs
Before completing:
name and descriptionIf ANY unchecked: STOP and fix before declaring complete.
<FINAL_EMPHASIS> Creating skills IS TDD for process documentation. Same Iron Law: No skill without failing test first. Same cycle: RED (baseline) → GREEN (write skill) → REFACTOR (close loopholes). If you follow TDD for code, follow it for skills. Untested skills are untested code - they will break in production.
REQUIRED BACKGROUND: Understand test-driven-development skill before using this skill. </FINAL_EMPHASIS>
development
Use when you have a spec, design doc, or requirements and need a detailed implementation plan before coding. Triggers: 'write a plan', 'create implementation plan', 'plan this out', 'break this down into steps', 'convert design to tasks', 'implementation order'. Also invoked by develop during planning. NOT for: reviewing existing plans (use reviewing-impl-plans).
testing
Use when creating new commands, editing existing commands, or reviewing command quality. Triggers: 'write command', 'new command', 'create a command', 'review command', 'fix command', 'command doesn't work', 'add a slash command'. NOT for: skill creation (use writing-skills).
development
Use when about to claim discovery during debugging. Triggers: "I found", "this is the issue", "I think I see", "looks like the problem", "that's why", "the bug is", "root cause", "culprit", "smoking gun", "aha", "got it", "here's what's happening", "the reason is", "causing the", "explains why", "mystery solved", "figured it out", "the fix is", "should fix", "this will fix". Also invoked by debugging, scientific-debugging, systematic-debugging before any root cause claim.
tools
System skill loaded at session start to initialize skill routing. Not invoked directly by users. Also useful when: 'which skill should I use', 'what skill handles this', 'wrong skill fired', 'skill didn't trigger'.