plugins/khuym/skills/writing-khuym-skills/SKILL.md
Use when creating a new khuym skill, editing an existing khuym skill, or verifying a skill works under pressure before deploying. Use when you need an agent skill to be bulletproof against rationalization. Do NOT use for project-specific AGENTS.md conventions or one-off solutions.
npx skillsauth add hoangnb24/skills writing-khuym-skillsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
If .khuym/onboarding.json is missing or stale for the current repo, stop and invoke khuym:using-khuym before continuing.
Skills are code. They have bugs. Test them before deploying.
This is the TDD-for-skills methodology adapted from Superpowers (N=28,000 scale testing confirms persuasion-optimized skills produce 3-4× better agent compliance than plain instructions).
THE IRON LAW: NO SKILL WITHOUT A FAILING TEST FIRST. Write skill before testing? Delete it. Start over. No exceptions — not for "simple additions," not for "just a section," not for "reference only."
Use this skill when you are about to:
When NOT to use: AGENTS.md files, project-specific conventions, one-off prompt instructions.
| TDD Concept | Skill Equivalent | |---|---| | Test case | Pressure scenario with subagent | | Production code | SKILL.md | | Test fails (RED) | Agent violates rule without skill | | Test passes (GREEN) | Agent complies with skill present | | Refactor | Close loopholes, maintain compliance |
HARD-GATE: Do not write any skill content until you complete this phase.
Teams that skip baseline testing consistently deploy skills with predictable, preventable failures.
Steps:
references/pressure-test-template.md)What to record:
Scenario: [name]
Combined pressures: [list]
Exact violation: [what agent chose]
Exact rationalization (verbatim): "[quote]"
Write SKILL.md addressing the specific rationalizations documented in RED only. Do not add content for hypothetical cases you didn't observe — hypothetical content bloats the skill and gets skipped.
SKILL.md checklist:
---)name: bare hyphen-case, matches the directory name exactlydescription: starts with "Use when..." — triggering conditions ONLY, no workflow summaryreferences/ when practicalreferences/ files never nested more than one level deepFor Khuym plugin skills specifically, keep frontmatter name bare. The plugin wrapper adds the khuym: prefix when the skill is surfaced to agents.
Description trap (most common mistake): Workflow summary in description → Claude follows description instead of reading skill body. Every time.
# ❌ BAD — workflow summary
description: Use when creating skills — run baseline test, write minimal skill, run tests
# ✅ GOOD — triggering conditions only
description: Use when creating a new khuym skill or editing an existing one
Dependency metadata style:
When a Khuym skill has runtime dependencies, write metadata.dependencies as a mapping keyed by dependency id. Do not use YAML arrays of objects such as - id: beads-cli; generic skill evaluators may reject that frontmatter shape.
metadata:
dependencies:
beads-cli:
kind: command
command: br
missing_effect: unavailable
reason: The skill reads and updates beads.
Apply persuasion principles:
| Principle | Implementation | Use For | |---|---|---| | Authority | "YOU MUST", "Never", "No exceptions" | Discipline-enforcing rules | | Commitment | Ordered checklists, announce skill usage | Multi-step processes | | Scarcity | "Before proceeding", "IMMEDIATELY after X" | Verification requirements | | Social Proof | "Teams report...", "X without Y = failure. Every time." | Common failure patterns | | Unity | "our skills", collaborative framing | Techniques, guidance |
After writing: re-run the same pressure scenarios WITH the skill. Agent must now comply. If agent still fails → skill is unclear or incomplete. Revise and re-test. Do not proceed.
When an agent violates a rule despite having the skill, that is a test regression — the skill has a bug. Fix it:
Continue until no new rationalizations emerge from pressure testing.
Meta-testing technique: After an agent chooses wrong, ask:
"You read the skill and chose Option C anyway. How could the skill have been written differently to make Option A the only acceptable answer?"
Three diagnoses:
Run validation:
python3 "${CODEX_HOME:-$HOME/.codex}/skills/.system/skill-creator/scripts/quick_validate.py" plugins/khuym/skills/<skill-name>
bash scripts/check-markdown-links.sh plugins/khuym/skills/<skill-name>/SKILL.md
bash scripts/sync-skills.sh --dry-run
If the edited skill owns a repo-local test script, run that too.
Create CREATION-LOG.md documenting the full TDD process (see references/creation-log-template.md):
Signs the skill IS bulletproof:
Signs the skill is NOT bulletproof:
| Excuse | Reality | |---|---| | "I know this technique, testing is unnecessary" | You're testing the SKILL, not your knowledge. Agents differ from you. | | "It's so simple it can't have bugs" | Every untested skill has issues. Test takes 30 minutes. | | "Academic questions passed — that's sufficient" | Reading a skill ≠ using a skill under pressure. Test application scenarios. | | "My description summarizes the workflow so agents know what to do" | Workflow-summary descriptions cause agents to skip the skill body. Remove it. | | "This edit is minor — testing isn't needed" | The Iron Law applies to edits. No exceptions. | | "I'll test it after a few real uses" | Problems = agents misuse in production. Test BEFORE deploying. | | "The baseline is obvious, I know what failures to expect" | You know YOUR failures. Agent failures differ. Run the baseline. |
All of these mean: Stop. Run baseline tests first.
Load when needed:
references/creation-log-template.md — CREATION-LOG.md template for documenting the TDD processreferences/pressure-test-template.md — Pressure scenario templates and the 7 pressure typesBackground: The TDD-for-skills methodology originates from the Superpowers framework (obra/superpowers). Persuasion research: Meincke et al. (2025), N=28,000 LLM conversations, University of Pennsylvania. Compliance methodology validated by ComplexBench, PromptAgent, and RNR studies (see research/15-tdd-skills-methodology.md for full citations).
tools
Use when the user asks to commit everything, smart commit, group commits, organize staged or unstaged changes, create logical commits, push current work, or preserve a clean commit stack from an existing working tree.
development
Use when starting or resuming any Khuym project session, choosing the next Khuym skill, running go mode, checking onboarding/scout state, or enforcing workflow gates. Bootstrap meta-skill for routing across the Khuym agentic development ecosystem.
development
Use when the user wants to turn a rough idea, vague task, feature wish, bug-fix intent, or Codex /goal objective into a clear verifiable goal prompt. Use when the user mentions goal mode, /goal, set_goal, long-running autonomous Codex work, or asks to be interviewed/grilled before creating a goal.
tools
Use when Codex should turn a concept, topic, tutorial, lesson, algorithm, system, workflow, or abstract idea into a playable browser game that teaches through interaction. Use for planning or implementing educational web games with the Game Studio plugin, including concept-to-mechanic translation, gameplay loop design, learning progression, and browser-game QA.