skill-optimizer/SKILL.md
Measure and fix AI skill activation, clarity, and cross-agent reliability. Use when skill uptake is weak, scores regress, or context is bloated. Not for writing new skills from scratch or general prompt engineering.
npx skillsauth add sanurb/skills skill-optimizerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Restrict what the agent can do so what it does, it does well. This skill tightens the harness on any AI skill — measure, diagnose, fix.
Identify the skill directory to optimize. If not obvious from context, ask the user which skill to optimize before proceeding.
Run the structural validator against the target skill before the full loop:
bash scripts/validate.sh <skill-path>
If validation fails, fix structural issues first. Do not start the benchmark loop on a broken skill.
Copy this checklist and track progress:
Optimization Progress:
- [ ] Step 0: Build eval scenarios (skip if evals exist)
- [ ] Step 1: Measure baseline and skill-on scores
- [ ] Step 2: Diagnose failure pattern
- [ ] Step 3: Apply fix from matched reference
- [ ] Step 4: Re-measure — confirm improvement, no regressions
- [ ] Step 5: Pass release gates before shipping
Step 0: Build evals first (low freedom — follow exactly)
Before touching skill text, create ≥3 eval scenarios:
Run evals WITHOUT the skill. Record baseline scores. This is the ground truth.
Step 1: Measure (low freedom)
Read references/benchmark-loop.md. Produce the benchmark matrix. Do not skip this.
Step 2: Diagnose (use decision tree)
What does the data show?
├─ Skill is ignored by models → read references/activation-design.md
├─ Scores dropped after skill change → read references/regression-triage.md
├─ Skill text too large / context rot → read references/context-budget.md
└─ All clear, ready to ship → read references/release-gates.md
Step 3: Apply fix (follow the matched reference — each is atomic, one job)
Step 4: Re-measure (low freedom — feedback loop)
Re-run the SAME eval scenarios from Step 1:
Step 5: Release gate (low freedom)
Read references/release-gates.md. Ship only if all MUST-PASS gates clear.
Deliver nothing if any criterion fails.
Every optimization produces exactly this artifact. Copy the template from assets/report-template.md and fill it in.
| File | One Job | |------|---------| | benchmark-loop.md | Produce the measurement matrix | | activation-design.md | Fix skill text so models follow it | | context-budget.md | Shrink token cost without losing behavior | | regression-triage.md | Isolate and eliminate negative deltas | | release-gates.md | Go/no-go checklist before shipping |
development
Sets up an `## Agent skills` block in AGENTS.md/CLAUDE.md and `docs/agents/` so the engineering skills know this repo's issue tracker (GitHub, GitLab, fp, or local markdown), triage label vocabulary, and domain doc layout. Run before first use of `fp-plan`, `fp-implement`, `fp-review`, `to-issues`, `to-prd`, `triage`, `diagnose`, `tdd`, `improve-codebase-architecture`, or `zoom-out` — or if those skills appear to be missing context about the issue tracker, triage labels, or domain docs.
development
Build a throwaway prototype to flush out a design before committing to it. Routes between two branches — a runnable terminal app for state/business-logic questions, or several radically different UI variations toggleable from one route. Use when the user wants to prototype, sanity-check a data model or state machine, mock up a UI, explore design options, or says "prototype this", "let me play with it", "try a few designs".
tools
Control herdr (a terminal-native agent multiplexer) from inside it. Manage workspaces and tabs, split panes, spawn sibling agents, read pane output, and wait for state changes — all via CLI commands that talk to the running herdr instance over a local unix socket. Use when running inside herdr (HERDR_ENV=1). Do not use outside herdr.
documentation
Compact the current conversation into a handoff document for another agent to pick up.