skills/skill-authoring/SKILL.md
Write and refine agent skills. Use when generating skills from docs, extracting from sessions, designing tool interfaces, managing lifecycle, or writing for dual-harness patterns.
npx skillsauth add sheurich/agent-skills skill-authoringInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
TDD for agent instructions. Test, write, shorten.
Use the right reference file for your task:
Note: Skill reference files must be resilient to context compaction. If an agent summarizes this routing table, ensure the distinct purpose of each file remains clear.
Create a skill when:
Don't create skills for one-off solutions, standard practices, or constraints enforceable with automation.
The evaluator is the bottleneck, not the agent. A skill that increases agent throughput without collapsing a review step makes the backlog worse.
Every proposed skill and every major edit must answer three questions in the SKILL.md or PR description: (1) Which review step does this skill eliminate or shorten? Name the concrete human action — reading a log, cross-referencing a spec, verifying a checklist — that this skill absorbs. (2) What evidence supports the claim? A before/after measurement, a session transcript, or a worked example showing the evaluator doing less. (3) What does the human stop doing? If the answer is "nothing" or "they still review the same artifacts," the skill fails this gate.
Some patterns reliably fail. Orchestration for its own sake — skills whose value proposition is "coordinates other agents" without identifying which review artifact disappears. Parallelism without a serialization bottleneck downstream — splitting work across agents when the human still reviews each output individually, meaning wall-clock savings accrue to the agent while review cost stays constant or increases. "Agent swarm" framings that treat agent utilization as the metric instead of evaluator time freed. Meta-skills that exist only as routing layers for other skills, adding indirection without absorbing any review surface.
A skill that genuinely passes this filter will have an obvious answer to question one. If you struggle to articulate which human action shrinks, the skill is optimizing the wrong side of the pipeline.
Only instructions that change agent behavior in ways the codebase can't convey on its own.
| Keep | Cut |
|------|-----|
| Specific tooling (use uv, run shellcheck) | Motivational framing |
| Concrete constraints (paths, commands) | Rationalization tables |
| Checklists and decision tables | Narrative examples |
| Error messages and stop conditions | Mandatory gates |
| Cross-references to other skills | Overviews and directory listings |
The description is the activation trigger. Include WHAT and WHEN. Describe triggering conditions only — not the skill's workflow.
# Good — triggering conditions
description: >-
Create git commits with intelligent file grouping.
Use when committing changes.
# Bad — summarizes workflow (agents follow this instead of reading the skill)
description: >-
Use when committing - groups files by concern, writes
conventional messages, runs pre-commit hooks
Context window is shared. Every paragraph must justify its cost.
| Scope | Target |
|-------|--------|
| Frequently-loaded skills | <200 lines |
| Standard skills | <500 lines |
| Reference material | references/ directory |
Prefer --help over documenting flags. Cross-reference other skills
instead of repeating content. One good example beats many mediocre ones.
skills/skill-name/
├── SKILL.md # Core instructions (loaded on activation)
├── references/ # Heavy content (loaded on demand)
├── scripts/ # Deterministic operations (executed, not read)
└── assets/ # Templates, images (used in output, never loaded)
At startup, agents see only name + description (~100 tokens). On activation, SKILL.md loads. References and scripts load only when the agent needs them during execution.
references/Split by content type, not just length:
| Content type | Where | Example |
|--------------|-------|---------|
| Workflow and decisions | SKILL.md | "If drift detected, choose…" |
| Lookup tables | references/ | Backend config per cloud provider |
| Platform-specific details | references/ | Installation commands per agent |
| API specs or field lists | references/ | Frontmatter field reference |
Heuristic: If the skill covers 3+ independent subtopics and an agent only needs one at a time, each belongs in a reference file. SKILL.md routes to the right one.
Don't wait until you hit 500 lines. A 200-line SKILL.md with three unrelated lookup tables is already a candidate for splitting.
Run a pressure scenario without the skill. Record what the agent did and where it went wrong.
Address the specific failures from RED. Don't add content for hypothetical cases. Re-run — the agent should now comply.
If the agent still fails, the instruction isn't clear enough — rewrite it shorter, not longer. A clear 3-line instruction outperforms a 30-line version. Cut until compliance breaks, then restore the last cut.
| Freedom | When | Format | |---------|------|--------| | High | Multiple valid approaches | Prose | | Medium | Preferred pattern, variation OK | Pseudocode | | Low | Fragile ops, must be consistent | Scripts |
<HARD-GATE>, "MUST use before ANY...")skills/<plugin>/<skill>/ (not discovered)tools
Delegate tasks to Swival for self-reviewed code changes, sandboxed execution, secret-safe operations, cached analysis, local-model inference, and A2A agent serving or client usage. Use when a task benefits from automated review loops against acceptance criteria, filesystem sandboxing, credential encryption, LLM response caching, or orchestrating a network of A2A agents.
development
Scan changed files with Semgrep for security and correctness issues. Use when verifying code before committing, reviewing implementation security, or running pre-commit quality gates.
development
Structured readiness assessment before proceeding to the next environment or phase. Use when asked for a readiness review, go/no-go assessment, or deployment review.
tools
Use when work should span one or more detached tasks but still behave like one job with a single owner context. TaskFlow is the durable flow substrate under authoring layers like Lobster, ACPX, plugins, or plain code. Keep conditional logic in the caller; use TaskFlow for flow identity, child-task linkage, waiting state, revision-checked mutations, and user-facing emergence.