skills/writing-system-prompts/SKILL.md
Use when writing or editing a system prompt for any LLM API or SDK (any code passing a `system=` / `system` role parameter, or a `.txt`/`.md` file holding such a prompt). Applies prompt-engineering and prompt-caching best practices.
npx skillsauth add lwlee2608/agent-skills writing-system-promptsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A well-built system prompt steers behavior across the whole conversation and — if structured for KV-cache reuse — costs a fraction of an uncached prefix on providers that offer prompt caching (Anthropic, OpenAI, Bedrock, Vertex, etc.).
Put the role first, in one sentence. A single descriptive sentence in the system field (not the user turn) anchors tone and vocabulary, e.g. You are a helpful coding assistant specializing in Python. Plain prose at the top — do not wrap in markup.
Order static-first, dynamic-last. Cache hits require a byte-identical prefix. Sequence: role → tools/policies → long reference docs / few-shot examples → cache breakpoint → dynamic context → user turn. Why: one changed token before the breakpoint drops you from cached-read pricing (~10% of input) to full input cost.
Mark cache breakpoints explicitly when the provider supports it. Most caching is opt-in (e.g. Anthropic cache_control, Bedrock cachePoint). Place a breakpoint at the end of each stable block to reuse. Cache writes typically cost more than base input; reads a fraction of it. TTLs vary (commonly ~5 min, sometimes longer tiers). Only cache prefixes you'll reuse within the TTL — sparse traffic pays the write penalty repeatedly.
# Anthropic example — adapt to your provider's syntax
system=[
{"type": "text", "text": ROLE_AND_POLICIES},
{"type": "text", "text": LONG_REFERENCE_DOC,
"cache_control": {"type": "ephemeral"}},
]
Never put churning content at the prefix. Current time, request ID, user ID, or conversation summary at the top defeats caching for the whole prompt. Push them past the breakpoint — either a second system message (for authoritative facts like date/IDs) or the user turn (for task material like questions and documents). Both preserve cache reuse equally.
Use structural delimiters for distinct sections. Wrap longform inputs and behavioral directives in clear delimiters — XML tags (<documents><document>…</document></documents>, <default_to_action>…</default_to_action>), markdown headers, or named JSON fields. Why: delimited sections survive long contexts better than walls of prose, and several model families (notably Claude) are explicitly trained on XML structure.
Place longform data above instructions in the user turn. For 20k+ token inputs, docs at the top, question at the bottom — measured up to 30% quality lift on multi-document tasks across major models.
Write literal, scoped instructions. Modern instruction-tuned models interpret prompts increasingly literally and won't silently generalize. If a rule applies broadly, say so: Apply this formatting to every section, not just the first one. Avoid ALL-CAPS shouting (CRITICAL, MUST) — it causes over-triggering on recent Claude models and adds little on others; use normal imperative voice.
Tune verbosity, tone, tool use, and subagent spawning only when the default is wrong. Examples: Provide concise, focused responses. Skip non-essential context. / Use a warm, collaborative tone. / Spawn multiple subagents in the same turn when fanning out across items. Do not spawn a subagent for work you can complete in a single response. Override defaults only after observing a problem — don't pre-empt.
After drafting or editing, check each:
Current date: {{today}} in the role line invalidates the cache every request. Move past the breakpoint — second system message or user turn.CRITICAL: YOU MUST… shouting. Causes over-triggering on recent models. Rewrite as plain imperatives.You are a… in messages[0] instead of the system field weakens steering and wastes a cacheable slot.documentation
Use when the user wants to condense the current conversation into a handoff document for another agent to pick up.
development
Use when the user asks to explain or teach a technical concept. Replies in plain language with a simple diagram instead of a wall of jargon.
tools
Use before running any Go toolchain command (`go build`, `go test`, `go run`, `go vet`, `go fmt`, `golangci-lint`). Substitutes make targets when a Makefile is present.
tools
Use when the user wants to interact with Linear.app — reading or searching issues/tickets.