core/ultimate-skill-creator/SKILL.md
Use when creating a new skill for Claude Code — whether from scratch, from a user request, or by formalizing behavior that should be consistent. Use when a user says "create a skill", "write a skill", "build a skill", "make a skill for", or describes behavior they want Claude to follow reliably across sessions. Use when improving, refactoring, or debugging an existing skill that fails to trigger, gets ignored under pressure, or produces inconsistent results. Use when evaluating whether a skill actually changes Claude's behavior versus baseline. Use when packaging a skill for sharing or marketplace distribution. Use when the task seems "simple" or "quick" — those are the highest-risk cases where Claude skips quality steps.
npx skillsauth add sharkitect-solutions/sharkitect-claude-toolkit ultimate-skill-creatorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Skills fail for three predictable reasons:
This skill addresses all three failures through a single workflow that combines structural planning (so skills are organized and discoverable), test-driven development (so skills resist pressure), and eval-based iteration (so skills actually trigger and work).
Every skill creation follows these three stages in order. Do not skip stages. Do not reorder them. Each stage depends on the output of the previous one.
Before writing a single word, plan the skill's architecture.
Define scope. Answer: What specific behavior does this skill enforce? What is explicitly out of scope? A skill that tries to cover everything covers nothing.
Plan the file structure. Decide what belongs where:
| Location | Content | Size Target |
|---|---|---|
| description (frontmatter) | Triggering conditions ONLY | 100-200 words |
| SKILL.md body | Core rules, rationalization defense, key patterns | 1,500-2,000 words |
| references/ | Detailed implementations, provider-specific guides | Unlimited per file |
| scripts/ | Reusable code the skill teaches Claude to use or adapt | As needed |
| examples/ | Complete working implementations | As needed |
Write the description first. The description determines whether the skill triggers. Rules:
See references/writing-rules.md for CSO (Claude Search Optimization) details and
examples of good vs bad descriptions.
Create the directory. Set up all planned directories before writing content.
See references/structure-guide.md for templates and organization patterns.
Write the skill body using test-driven methodology. This is where most skills fail — they explain concepts but don't change behavior under pressure.
RED phase — Capture failures first.
Before writing the skill body, discover HOW Claude fails without it:
GREEN phase — Write minimal content that passes.
Write the SKILL.md body to directly address the captured failures:
REFACTOR phase — Close loopholes.
Re-read the skill body and ask: "How would Claude rationalize ignoring THIS rule?"
For every loophole found: add it to the rationalization table, add the bad pattern to red flags, and tighten the rule language. Repeat until you can't find new loopholes.
See references/tdd-methodology.md for the complete TDD process, pressure type taxonomy,
and subagent testing methodology.
Test the skill against realistic scenarios and iterate until it works.
Create test cases. Write 10-20 prompts representing real usage:
For each, define expected behaviors — specific things the output MUST or MUST NOT contain.
Run subagent evaluations. For each test case, spawn two subagents:
Grade and iterate. For each failing test:
Optimize the description. After content is stable:
Use scripts/run_evals.py generate <skill-path> to auto-generate test prompts from
evals.json, then run_evals.py grade for interactive scoring and run_evals.py report
for aggregate benchmarks.
See references/eval-pipeline.md for detailed testing methodology and grading criteria.
When fixing or improving an existing skill, always add a failing test FIRST, make the minimal change, then re-run ALL evals to prevent regression. Never change a skill based on a single report without verifying existing tests still pass.
See references/skill-versioning.md for the complete update workflow, version bumping,
and deprecation process.
The description field is the ONLY thing Claude reads when deciding whether to load a skill. If the description summarizes what the skill does, Claude treats that summary as sufficient and skips reading the full body. This is tested and proven.
The description must contain ONLY triggering conditions. Nothing else.
Every skill that enforces discipline MUST include a rationalization table. Without one, Claude invents excuses to skip rules under pressure. The table must have three columns:
| Rationalization | When It Appears | Why It Is Wrong | |---|---|---|
The "When It Appears" column is critical — it helps Claude pattern-match in real time and catch itself rationalizing before it acts on the rationalization.
Claude loads skill content in three stages, each consuming more context:
Never put reference material in the body. Keep the body focused on rules, patterns, and rationalization defense. Point to references for implementation details.
Every disciplinary skill needs a "stop and check" list — specific patterns indicating the skill is being violated. Make these concrete and scannable. Use checkboxes.
| Rationalization | When It Appears | Why It Is Wrong | |---|---|---| | "This skill is too simple to need pressure testing" | Small or focused skill | Simple skills are the ones Claude skips most often. Pressure testing takes 10 minutes and catches real failures. | | "I already know what the content should be" | High domain expertise | Knowing the domain doesn't mean knowing how Claude fails. RED phase reveals specific, non-obvious rationalizations. | | "The description is fine, I'll optimize later" | Eager to write content | A skill that never triggers is wasted work regardless of content quality. Description quality determines whether the skill exists in practice. | | "One test case is enough to validate" | Time pressure during creation | One test case covers one scenario. Claude fails in specific, contextual ways that only multiple tests reveal. Minimum: 10. | | "I don't need a rationalization table for this" | The skill teaches factual content, not discipline | Claude rationalizes skipping factual content too ("the user said keep it simple"). Every skill benefits from naming the excuses. | | "I can skip the eval stage, the content is solid" | Confidence in the writing | Content quality and triggering accuracy are independent. A perfectly written skill with a bad description never loads. Test both. |
Before declaring any skill complete, verify none of these are present:
A skill is done when ALL of these pass:
| File | Purpose |
|---|---|
| references/structure-guide.md | File organization, frontmatter format, directory templates, word count targets |
| references/tdd-methodology.md | Complete RED-GREEN-REFACTOR process, pressure type taxonomy, subagent testing |
| references/eval-pipeline.md | Test case design, subagent evaluation, grading, benchmarking, description optimization |
| references/writing-rules.md | CSO rules, description patterns, body writing style, rationalization table design |
| references/skill-versioning.md | When and how to update existing skills, version bumping, regression prevention |
| examples/skill-directory-template.md | Reference directory layout and frontmatter template for new skills |
| scripts/init_skill.py | Scaffolds a new skill directory with the prescribed structure (not a generic template) |
| scripts/quick_validate.py | Validates skill against CSO rules, body quality, word counts, and required sections |
| scripts/run_evals.py | Generates test prompts, runs interactive grading, calculates aggregate benchmarks |
| scripts/package_skill.py | Packages a skill folder into a distributable zip (runs validation first) |
| evals.json | 20 test cases for the ultimate-skill-creator itself (meta self-test) |
development
When the user wants help with paid advertising campaigns on Google Ads, Meta (Facebook/Instagram), LinkedIn, Twitter/X, or other ad platforms. Also use when the user mentions 'PPC,' 'paid media,' 'ad copy,' 'ad creative,' 'ROAS,' 'CPA,' 'ad campaign,' 'retargeting,' or 'audience targeting.' This skill covers campaign strategy, ad creation, audience targeting, and optimization.
testing
--- name: using-sharkitect-methodology description: Use when starting any conversation in a Sharkitect workspace OR before any task involving NEW pricing, positioning, proposal, strategy, plan-execution, or schema-design work — mandates invocation of Sharkitect-specific methodology skills (pricing-strategy, marketing-strategy-pmm, smb-cfo, hq-revenue-ops, executing-plans, brainstorming) under the same anti-rationalization discipline as using-superpowers. Documentation has failed 4 times across H
testing
Use when user says 'end session', 'wrap up', 'stop for the day', 'done for today', 'close out', 'save session', 'wrapping up', or invokes /end-session. Runs the full 9-step end-of-session protocol: resource audit, MEMORY.md update, lessons capture, plan status, pending items, workspace checklist, .tmp/ audit, git commit+push, Supabase brain sync, session brief, summary. Final step schedules a detached self-kill of the current session ONLY (3s delay) so the window closes cleanly. Other claude.exe processes (active workspaces) are NOT touched -- orphan cleanup is handled separately by Claude-Orphan-Cleanup-Hourly with proper age safeguards. Do NOT use for: mid-session quick saves (use session-checkpoint), skill syncing (use sync-skills.py), brain memory queries (use supabase-sync.py pull), document freshness reviews (use document-lifecycle), resource gap detection (use resource-auditor).
testing
Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, passive voice, negative parallelisms, and filler phrases.