Instruction Craftsmanship Guidelines

Review and audit any agent-facing instructions — task prompts, workflow steps, skill procedures, self-test protocols — to eliminate shallow-execution vulnerabilities. Applies to any Claude agent system, not only this repo's framework.

Modes of Operation

Author Mode: Review proposed instructions before deployment to catch execution gaps.
Audit Mode: Analyze execution transcripts after a failure to trace it back to instruction gaps.

Quality Criteria: The Defect Classes

Ensure instructions are free of the following defects:

Compliance Framing: Avoid instructions defined as "did X run?". Require outcome-based verification ("is the output correct, complete, and verified?").
Missing Artifact Chain: Ensure all output channels (stdout, stderr, log files, JSONL transcripts, schema validations) are checked, not just the primary summary channel.
No Adversarial Checks: Explicitly check for silent failures (e.g., zero exit code on empty/corrupt outputs, config warn-instead-of-block overrides).
Summary-as-Evidence: Prohibit using agent summaries or claims as proof of success. Require direct inspection of actual artifacts.
Undefined Boundary Behavior: Explicitly define fallback search spaces or escalation procedures when standard searches return no results.
Skimped Verification: Require reading the complete output of files rather than simple grepping, keyword matching, or tail scans.
No Negative Verification: Check for the absence of unexpected outputs (corruption, credentials leak, placeholders) in addition to the presence of expected results.
Deferred-Read Dispersion: A rule the agent needs at the moment of action lives in a second file it must go read — a "see X.md", a [[link]] to "canonical" doctrine, a "read first" pointer. An agent that already has the instructions in hand frequently will not make the follow-up read, so a load-bearing rule behind a pointer is a rule that often won't run. Keep the operative instruction where it executes and inline it; reserve pointers for genuinely optional depth, never for a step that is required every time. Shorter, co-located instructions beat longer, more distributed ones in almost all cases — when content is mandatory, fold it in and tighten rather than forking it into a referenced file (and never duplicate it across both the summary and the linked file, which is the worst of both). Audit-mode tell: the executing instruction was a pointer, and the missed rule lived one read away.
Output-Shape Without Source-and-Action (Substitution by Path of Least Resistance): A required step that specifies what the output should look like but names no source and does not mandate the operation that produces it. An agent satisfies the described shape from whatever is cheapest to hand — material already in the file, a by-product of an adjacent step, prior memory — silently dropping the real criterion (a fresh reconstruction from primary sources). For any step that is required every run, write it as a direct, numbered, imperative instruction that names the source ("read <path>/query <tool>") and mandates the operation ("you MUST open them before writing this section"), and explicitly prohibit substituting material already in the file, adjacent-step by-products, or prior memory for the mandated source-read. Diagnostic contrast (audit-mode tell): in the same skill, a sibling step that names its source and forces a live look-up executes correctly while the shape-only step gets substituted. Caveat — name the source and the action, not the keystrokes: over-specified, step-bloated instructions are their own defect.
Over-Fitting & Incidental Ballast (keep instructions lean): An always-loaded instruction carrying weight that does not change what the agent does. Two forms: (a) provenance ballast — a dated event ("on the 2026-06-25 session…"), or an issue/incident/memory ID cited as the reason for a rule (#1978, aops-18572bc0 §5, mem-e7b976da). The agent does not need to know when or why the rule was born to follow it. (b) over-fitted failure recipes — a step tuned to one specific failure ("if the PKB returns error X, emit [ATTN]…, then file a follow-up task, and do not shell out or SSH or write a file…") where a durable principle covers it and every future variant. Write the principle: "fail fast if the memory tools don't work" beats the six-line incident-specific escape-hatch ban; "hold between steps" beats "this reconstructs the #1978 held-turn regression." Provenance — dates, issue/incident/memory IDs, the war story — belongs in the spec, PKB, or commit message that records why the rule exists, not in the instruction loaded every turn. Every line of an agent definition or skill material should earn its place by changing behavior. Caveat: this is not licence to strip the operative specificity that defects 6/9 require (naming a source, mandating an operation) — cut ballast, not the source-and-action.
Slash-Command Directives: Avoid instructing the agent to run/execute slash commands (e.g. /command) directly. System constraints typically forbid agents from executing slash commands. Instead, instruct them to invoke the underlying skill or subagent (e.g. "invoke the verify skill").

These are common patterns, not an exhaustive list. If instructions feel shallow but match no named defect, trust the feeling, say so, and articulate why — and remember depth is verification specificity, not step count.

Construction Rule: Static-Prefix / Variable-Tail (prompt-cache prefix)

This rule binds any code that renders a template with dynamic data — a .md template with {placeholder}s, an f-string/.format(/.render( that wraps a template body around runtime values, a builder that concatenates a static preamble with session/transcript/variable content, any gate context_key or audit-file assembly.

Rule: emit ALL static template material strictly FIRST, then append the variable/dynamic content LAST. Never interleave a variable into a static preamble.

Why: Anthropic prompt caching keys on the longest identical PREFIX. One variable byte placed early invalidates the entire cacheable suffix that follows it. A stable static prefix with the variable payload at the tail keeps the prefix cache-hot across calls; an interleaved variable throws the cache away on every render.

How to apply:

Move instructional scaffolding, headers, role/framing, and "how to read this" guidance ahead of any {session_context}-style payload. The largest/most-variable field belongs at the very end.
If a template puts a static section AFTER the variable (e.g. an "## Your Assessment" trailer below the transcript), hoist that static section above the variable so the variable is last.
Honor any tail invariant: if a builder appends a terminal sentinel (e.g. an audit-complete marker that must be the final line), the variable that carries it must remain the file's tail — which is exactly what this rule produces.

Judgment clause: where moving a placeholder to the tail would break meaning or readability and the cache gain is marginal (a tiny single-token variable mid-sentence in a short user-facing message), leave it and say why. Do not mangle a template mechanically to satisfy the rule. The rule earns its keep on large, reused, variable-bearing prompts.

Workflow

Author Mode Workflow

Assess the target instructions against the defect classes.
Quote any text exhibiting a defect and write a high-depth rewrite.
Output a verdict: SHIP (no defects), REVISE (defects found, edit file in-place with fixes), or REJECT (fundamental redesign needed).

Audit Mode Workflow

Identify what the agent missed and locate the executing instruction.
Classify the instruction gap under the defect classes.
Edit the instruction in-place with a rewrite to prevent the failure.

Output Expectations

Respond with structured, direct reviews or audits. Keep lists and verdicts highly concise, citing exact line differences where revisions are made.

Instruction Craftsmanship Guidelines

Modes of Operation

Author Mode: Review proposed instructions before deployment to catch execution gaps.
Audit Mode: Analyze execution transcripts after a failure to trace it back to instruction gaps.

Quality Criteria: The Defect Classes

Ensure instructions are free of the following defects:

Compliance Framing: Avoid instructions defined as "did X run?". Require outcome-based verification ("is the output correct, complete, and verified?").
Missing Artifact Chain: Ensure all output channels (stdout, stderr, log files, JSONL transcripts, schema validations) are checked, not just the primary summary channel.
No Adversarial Checks: Explicitly check for silent failures (e.g., zero exit code on empty/corrupt outputs, config warn-instead-of-block overrides).
Summary-as-Evidence: Prohibit using agent summaries or claims as proof of success. Require direct inspection of actual artifacts.
Undefined Boundary Behavior: Explicitly define fallback search spaces or escalation procedures when standard searches return no results.
Skimped Verification: Require reading the complete output of files rather than simple grepping, keyword matching, or tail scans.
No Negative Verification: Check for the absence of unexpected outputs (corruption, credentials leak, placeholders) in addition to the presence of expected results.
Deferred-Read Dispersion: A rule the agent needs at the moment of action lives in a second file it must go read — a "see X.md", a [[link]] to "canonical" doctrine, a "read first" pointer. An agent that already has the instructions in hand frequently will not make the follow-up read, so a load-bearing rule behind a pointer is a rule that often won't run. Keep the operative instruction where it executes and inline it; reserve pointers for genuinely optional depth, never for a step that is required every time. Shorter, co-located instructions beat longer, more distributed ones in almost all cases — when content is mandatory, fold it in and tighten rather than forking it into a referenced file (and never duplicate it across both the summary and the linked file, which is the worst of both). Audit-mode tell: the executing instruction was a pointer, and the missed rule lived one read away.
Output-Shape Without Source-and-Action (Substitution by Path of Least Resistance): A required step that specifies what the output should look like but names no source and does not mandate the operation that produces it. An agent satisfies the described shape from whatever is cheapest to hand — material already in the file, a by-product of an adjacent step, prior memory — silently dropping the real criterion (a fresh reconstruction from primary sources). For any step that is required every run, write it as a direct, numbered, imperative instruction that names the source ("read <path>/query <tool>") and mandates the operation ("you MUST open them before writing this section"), and explicitly prohibit substituting material already in the file, adjacent-step by-products, or prior memory for the mandated source-read. Diagnostic contrast (audit-mode tell): in the same skill, a sibling step that names its source and forces a live look-up executes correctly while the shape-only step gets substituted. Caveat — name the source and the action, not the keystrokes: over-specified, step-bloated instructions are their own defect.
Over-Fitting & Incidental Ballast (keep instructions lean): An always-loaded instruction carrying weight that does not change what the agent does. Two forms: (a) provenance ballast — a dated event ("on the 2026-06-25 session…"), or an issue/incident/memory ID cited as the reason for a rule (#1978, aops-18572bc0 §5, mem-e7b976da). The agent does not need to know when or why the rule was born to follow it. (b) over-fitted failure recipes — a step tuned to one specific failure ("if the PKB returns error X, emit [ATTN]…, then file a follow-up task, and do not shell out or SSH or write a file…") where a durable principle covers it and every future variant. Write the principle: "fail fast if the memory tools don't work" beats the six-line incident-specific escape-hatch ban; "hold between steps" beats "this reconstructs the #1978 held-turn regression." Provenance — dates, issue/incident/memory IDs, the war story — belongs in the spec, PKB, or commit message that records why the rule exists, not in the instruction loaded every turn. Every line of an agent definition or skill material should earn its place by changing behavior. Caveat: this is not licence to strip the operative specificity that defects 6/9 require (naming a source, mandating an operation) — cut ballast, not the source-and-action.
Slash-Command Directives: Avoid instructing the agent to run/execute slash commands (e.g. /command) directly. System constraints typically forbid agents from executing slash commands. Instead, instruct them to invoke the underlying skill or subagent (e.g. "invoke the verify skill").

Construction Rule: Static-Prefix / Variable-Tail (prompt-cache prefix)

Rule: emit ALL static template material strictly FIRST, then append the variable/dynamic content LAST. Never interleave a variable into a static preamble.

How to apply:

Move instructional scaffolding, headers, role/framing, and "how to read this" guidance ahead of any {session_context}-style payload. The largest/most-variable field belongs at the very end.
If a template puts a static section AFTER the variable (e.g. an "## Your Assessment" trailer below the transcript), hoist that static section above the variable so the variable is last.
Honor any tail invariant: if a builder appends a terminal sentinel (e.g. an audit-complete marker that must be the final line), the variable that carries it must remain the file's tail — which is exactly what this rule produces.

Workflow

Author Mode Workflow

Assess the target instructions against the defect classes.
Quote any text exhibiting a defect and write a high-depth rewrite.
Output a verdict: SHIP (no defects), REVISE (defects found, edit file in-place with fixes), or REJECT (fundamental redesign needed).

Audit Mode Workflow

Identify what the agent missed and locate the executing instruction.
Classify the instruction gap under the defect classes.
Edit the instruction in-place with a rewrite to prevent the failure.

Output Expectations

Respond with structured, direct reviews or audits. Keep lists and verdicts highly concise, citing exact line differences where revisions are made.

Adoption

nicsuzor/craft

$ install --global

Security Scan Results

SKILL.md

Instruction Craftsmanship Guidelines

Modes of Operation

Quality Criteria: The Defect Classes

Construction Rule: Static-Prefix / Variable-Tail (prompt-cache prefix)

Workflow

Author Mode Workflow

Audit Mode Workflow

Output Expectations

Related Skills

nicsuzor/end_session

nicsuzor/dump

nicsuzor/daily

nicsuzor/narrative-digest

nicsuzor/craft

$ install --global

Security Scan Results

SKILL.md

Instruction Craftsmanship Guidelines

Modes of Operation

Quality Criteria: The Defect Classes

Construction Rule: Static-Prefix / Variable-Tail (prompt-cache prefix)

Workflow

Author Mode Workflow

Audit Mode Workflow

Output Expectations

Related Skills

nicsuzor/end_session

nicsuzor/dump

nicsuzor/daily

nicsuzor/narrative-digest