skills/ultimate-skill-creator/SKILL.md
Use when creating a new skill for Claude Code -- from scratch, from a user request, or by formalizing behavior that should be consistent. Use when a user says "create a skill", "write a skill", "build a skill", "make a skill for", or describes behavior they want Claude to follow reliably across sessions. Use when improving, refactoring, or debugging an existing skill that fails to trigger, gets ignored under pressure, or produces inconsistent results. Use when packaging a skill for sharing or marketplace distribution. Use when the task seems "simple" or "quick" -- those are the highest-risk cases where Claude skips quality steps. Do NOT use for: skill-judge (evaluating/scoring existing skills for quality), superpowers:writing-skills (auto-triggered writing workflows).
npx skillsauth add sharkitect-solutions/sharkitect-claude-toolkit ultimate-skill-creatorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Skills fail for three predictable reasons:
This skill addresses all three failures through a single workflow that combines structural planning (so skills are organized and discoverable), test-driven development (so skills resist pressure), and eval-based iteration (so skills actually trigger and work).
| File | Load When | Do NOT Load |
|---|---|---|
| references/structure-guide.md | Planning file organization, setting up directories, frontmatter format | Content already structured, in writing/testing phase |
| references/writing-rules.md | Writing descriptions (CSO), body content, rationalization tables, red flags | Structure planned, in eval/testing phase |
| references/tdd-methodology.md | RED-GREEN-REFACTOR process, pressure scenarios, subagent baseline tests | Skill already passes pressure tests |
| references/eval-pipeline.md | Creating test cases, subagent evaluations, grading, benchmarking | Skill body not written yet (finish TDD first) |
| references/skill-versioning.md | Updating existing skills, version bumps, preventing regressions | Creating a new skill from scratch |
| references/description-exclusions.md | Writing exclusion patterns, designing Scope Boundary tables, routing between similar skills | Single-purpose skill with no overlapping neighbors |
| references/companion-design.md | Designing companion files, planning progressive disclosure, choosing content types | Simple skill under 100 lines that doesn't need companions |
| references/optimization-lessons.md | Domain saturation assessment, named anti-pattern design, decision matrix design, compression | First draft (optimization lessons apply during refinement) |
| scripts/init_skill.py | Scaffolding a new skill directory | Skill directory already exists |
| scripts/quick_validate.py | Validating against CSO rules and required sections | Skill still being drafted |
| scripts/run_evals.py | Running test prompts, interactive grading, aggregate benchmarks | No test cases written yet |
| scripts/package_skill.py | Packaging completed skill for distribution | Skill hasn't passed validation |
| examples/skill-directory-template.md | Setting up a new skill directory from template | Updating existing skill |
| evals.json | Meta self-test reference for the skill-creator itself | Normal skill creation |
| Request | This Skill | Use Instead | |---|---|---| | "Create a skill for X" | YES | - | | "Build a skill that does Y" | YES | - | | "Fix this skill -- it's not triggering" | YES | - | | "Improve/refactor this skill" | YES | - | | "Package this skill for sharing" | YES | - | | "Score/evaluate this skill's quality" | NO | skill-judge | | "Audit all my skills for quality" | NO | skill-judge | | "Write better content for this skill" | PARTIAL -- structure + TDD, not auto-writing | superpowers:writing-skills | | "Create an agent (not a skill)" | NO | agent-development, ai-agents-architect | | "Help with prompt engineering" | NO | prompt-engineering-guidance |
Every skill creation follows these three stages in order. Do not skip stages. Do not reorder them. Each stage depends on the output of the previous one.
Before writing a single word, plan the skill's architecture.
Define scope. Answer: What specific behavior does this skill enforce? What is explicitly out of scope? A skill that tries to cover everything covers nothing.
Plan the file structure. Decide what belongs where:
| Location | Content | Size Target |
|---|---|---|
| description (frontmatter) | Triggering conditions ONLY | 100-200 words |
| SKILL.md body | Core rules, rationalization defense, key patterns | 1,500-2,000 words |
| references/ | Detailed implementations, provider-specific guides | Unlimited per file |
| scripts/ | Reusable code the skill teaches Claude to use or adapt | As needed |
| examples/ | Complete working implementations | As needed |
Write the description first. The description determines whether the skill triggers. Rules:
See references/writing-rules.md for CSO (Claude Search Optimization) details and
examples of good vs bad descriptions.
Create the directory. Set up all planned directories before writing content.
See references/structure-guide.md for templates and organization patterns.
Write the skill body using test-driven methodology. This is where most skills fail — they explain concepts but don't change behavior under pressure.
RED phase — Capture failures first.
Before writing the skill body, discover HOW Claude fails without it:
GREEN phase — Write minimal content that passes.
Write the SKILL.md body to directly address the captured failures:
REFACTOR phase — Close loopholes.
Re-read the skill body and ask: "How would Claude rationalize ignoring THIS rule?"
For every loophole found: add it to the rationalization table, add the bad pattern to red flags, and tighten the rule language. Repeat until you can't find new loopholes.
See references/tdd-methodology.md for the complete TDD process, pressure type taxonomy,
and subagent testing methodology.
Test the skill against realistic scenarios and iterate until it works.
Create test cases. Write 10-20 prompts representing real usage:
For each, define expected behaviors — specific things the output MUST or MUST NOT contain.
Run subagent evaluations. For each test case, spawn two subagents:
Grade and iterate. For each failing test:
Optimize the description. After content is stable:
Use scripts/run_evals.py generate <skill-path> to auto-generate test prompts from
evals.json, then run_evals.py grade for interactive scoring and run_evals.py report
for aggregate benchmarks.
See references/eval-pipeline.md for detailed testing methodology and grading criteria.
When fixing or improving an existing skill, always add a failing test FIRST, make the minimal change, then re-run ALL evals to prevent regression. Never change a skill based on a single report without verifying existing tests still pass.
See references/skill-versioning.md for the complete update workflow, version bumping,
and deprecation process.
The description field is the ONLY thing Claude reads when deciding whether to load a skill. If the description summarizes what the skill does, Claude treats that summary as sufficient and skips reading the full body. This is tested and proven.
The description must contain ONLY triggering conditions. Nothing else.
Every skill that enforces discipline MUST include a rationalization table. Without one, Claude invents excuses to skip rules under pressure. The table must have three columns:
| Rationalization | When It Appears | Why It Is Wrong | |---|---|---|
The "When It Appears" column is critical — it helps Claude pattern-match in real time and catch itself rationalizing before it acts on the rationalization.
Claude loads skill content in three stages, each consuming more context:
Never put reference material in the body. Keep the body focused on rules, patterns, and rationalization defense. Point to references for implementation details.
Every disciplinary skill needs a "stop and check" list — specific patterns indicating the skill is being violated. Make these concrete and scannable. Use checkboxes.
| Rationalization | When It Appears | Why It Is Wrong | |---|---|---| | "This skill is too simple to need pressure testing" | Small or focused skill | Simple skills are the ones Claude skips most often. Pressure testing takes 10 minutes and catches real failures. | | "I already know what the content should be" | High domain expertise | Knowing the domain doesn't mean knowing how Claude fails. RED phase reveals specific, non-obvious rationalizations. | | "The description is fine, I'll optimize later" | Eager to write content | A skill that never triggers is wasted work regardless of content quality. Description quality determines whether the skill exists in practice. | | "One test case is enough to validate" | Time pressure during creation | One test case covers one scenario. Claude fails in specific, contextual ways that only multiple tests reveal. Minimum: 10. | | "I don't need a rationalization table for this" | The skill teaches factual content, not discipline | Claude rationalizes skipping factual content too ("the user said keep it simple"). Every skill benefits from naming the excuses. | | "I can skip the eval stage, the content is solid" | Confidence in the writing | Content quality and triggering accuracy are independent. A perfectly written skill with a bad description never loads. Test both. |
Before declaring any skill complete, verify none of these are present:
A skill is done when ALL of these pass:
development
When the user wants help with paid advertising campaigns on Google Ads, Meta (Facebook/Instagram), LinkedIn, Twitter/X, or other ad platforms. Also use when the user mentions 'PPC,' 'paid media,' 'ad copy,' 'ad creative,' 'ROAS,' 'CPA,' 'ad campaign,' 'retargeting,' or 'audience targeting.' This skill covers campaign strategy, ad creation, audience targeting, and optimization.
testing
--- name: using-sharkitect-methodology description: Use when starting any conversation in a Sharkitect workspace OR before any task involving NEW pricing, positioning, proposal, strategy, plan-execution, or schema-design work — mandates invocation of Sharkitect-specific methodology skills (pricing-strategy, marketing-strategy-pmm, smb-cfo, hq-revenue-ops, executing-plans, brainstorming) under the same anti-rationalization discipline as using-superpowers. Documentation has failed 4 times across H
testing
Use when user says 'end session', 'wrap up', 'stop for the day', 'done for today', 'close out', 'save session', 'wrapping up', or invokes /end-session. Runs the full 9-step end-of-session protocol: resource audit, MEMORY.md update, lessons capture, plan status, pending items, workspace checklist, .tmp/ audit, git commit+push, Supabase brain sync, session brief, summary. Final step schedules a detached self-kill of the current session ONLY (3s delay) so the window closes cleanly. Other claude.exe processes (active workspaces) are NOT touched -- orphan cleanup is handled separately by Claude-Orphan-Cleanup-Hourly with proper age safeguards. Do NOT use for: mid-session quick saves (use session-checkpoint), skill syncing (use sync-skills.py), brain memory queries (use supabase-sync.py pull), document freshness reviews (use document-lifecycle), resource gap detection (use resource-auditor).
testing
Remove signs of AI-generated writing from text. Use when editing or reviewing text to make it sound more natural and human-written. Based on Wikipedia's comprehensive "Signs of AI writing" guide. Detects and fixes patterns including: inflated symbolism, promotional language, superficial -ing analyses, vague attributions, em dash overuse, rule of three, AI vocabulary words, passive voice, negative parallelisms, and filler phrases.