workflows/workflows/agent-environment-setup/platforms/codex/skills/skill-creator/SKILL.md
Use when creating or updating skills, evaluating skill performance, or benchmarking instruction quality. Covers skill authoring, edits, eval design, and iterative improvement of skill behavior.
npx skillsauth add cubetiq/cubis-foundry skill-creatorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Meta-skill for creating, editing, evaluating, and benchmarking skills in the Cubis Foundry skill library. Provides the full skill development lifecycle — from initial creation through eval-driven iteration to production-grade quality. References the complete 65-skill library as examples of the Pro Skill Standard.
Build a new skill from scratch with all required files.
Improve an existing skill's instructions, add references, or fix eval failures.
Run a skill's evals and report pass/fail with actionable feedback.
Run evals multiple times, compute variance, and identify flaky assertions.
Determine the mode before starting — ask the user if unclear, because each mode has different inputs and outputs. Default to Create if the user says "make a skill" or "new skill."
For Create mode: gather the skill's domain, target audience, and trigger conditions — these determine the skill name, description, and instruction scope, because a well-scoped skill has clear boundaries that prevent overlap with other skills.
Choose a skill name that is kebab-case, descriptive, and unique — check the existing skill library (65 skills across 10 categories) for naming conflicts, because duplicate or ambiguous names cause routing failures in the MCP skill tools.
Write the YAML frontmatter with all required fields — name, description (with trigger conditions), license, metadata (author, version), and compatibility, because the frontmatter is parsed by the platform loader and invalid frontmatter causes silent skill loading failures.
Write the description as a trigger specification — start with "Use when..." followed by 3-6 specific trigger conditions, because the description is the primary input to the skill routing engine and vague descriptions cause mis-routing.
Structure the body with mandatory sections — Purpose (1 paragraph), When to Use (bullet list), Instructions (12-18 numbered items), Output Format, and References table, because this structure is the Pro Skill Standard and parsers expect it.
Write each instruction as WHAT + WHY — state what to do, then explain why with "because..." reasoning, because instructions without reasoning are followed mechanically without adaptation to context.
Order instructions by execution sequence — setup/prerequisites first, then core workflow, then verification/cleanup, because numbered instructions imply ordering and out-of-order instructions cause confusion.
Include 12-18 instructions — fewer than 12 leaves critical guidance gaps, more than 18 causes instruction fatigue and selective following, because the sweet spot balances completeness with adherence.
Create 3-6 reference files — each 80-150 lines covering a deep subtopic, because references enable progressive disclosure (loading detail only when needed) and keep the main SKILL.md focused.
Create evals with 2+ prompts and 5 assertions each — evals are the skill's test suite, because untested skills degrade silently when instructions are modified.
Write assertions that test behavioral output, not exact strings — use "identifies X", "recommends Y", "addresses Z" patterns, because LLM output varies in phrasing but should be consistent in substance.
Create 2 usage examples — one basic (01-basic-usage.md) and one advanced (02-advanced-usage.md), because examples serve as both documentation and additional eval material.
For Edit mode: read the existing skill first, then identify specific gaps — compare against the Pro Skill Standard checklist before making changes, because unfocused edits introduce inconsistencies.
For Eval mode: run each eval prompt through the skill and check assertions — report pass/fail per assertion with specific evidence, because binary pass/fail without evidence makes iteration impossible.
For Benchmark mode: run evals 3+ times and compute assertion pass rates — flag assertions with less than 80% pass rate as flaky, because flaky assertions indicate either ambiguous instructions or overly specific assertions.
For platform adaptations: tailor frontmatter and instruction references per platform — Claude Code gets full frontmatter (allowed-tools, inline posture guidance, agent), Codex/Gemini/Antigravity get name+description only, Copilot matches Claude format, because each platform parses different frontmatter fields.
Validate the complete skill package before declaring done — check JSON validity of evals.json, verify all reference files mentioned in the table exist, confirm examples are complete, because broken references and invalid JSON cause runtime failures.
## Skill Package: <name>
[Complete file manifest with all created files]
## Quality Check
[Pro Skill Standard checklist with pass/fail per item]
## Eval Results: <skill-name>
| Eval | Assertion | Result | Evidence |
|------|-----------|--------|----------|
## Benchmark: <skill-name> (N runs)
| Assertion | Pass Rate | Status |
|-----------|-----------|--------|
python-best-practices, typescript-best-practices, golang-best-practices, rust-best-practices, javascript-best-practices, java-best-practices, kotlin-best-practices, swift-best-practices, csharp-best-practices, php-best-practices
go-fiber, nestjs, fastapi, express-nodejs, gin-golang, laravel, django-drf, spring-boot, nextjs, react, vuejs, svelte-sveltekit, react-native, t3-stack, remix, prisma, sqlalchemy, drizzle-orm
design, system-design, microservices-design, api-design, database-design, architecture-doc, tech-doc
web-testing, android-emulator-testing, ios-simulator-testing, systematic-debugging
owasp-security-review, pentest-skill, vibesec, secret-management, sanitize-pii
ci-cd-pipeline, docker-compose-dev, kubernetes-deploy, observability
llm-eval, rag-patterns, prompt-engineering
git-workflow, code-review, sadd, kaizen-iteration, requesting-code-review, receiving-code-review
stripe-integration, expo-app, react-native-callstack, huggingface-ml, google-workspace, mcp-server-builder
skill-creator
| File | Load when |
| --- | --- |
| references/pro-skill-standard.md | Creating or reviewing a skill against quality standards. |
| references/platform-adaptations.md | Adapting a skill for different platforms (Claude, Codex, Gemini, Antigravity, Copilot). |
| references/eval-design.md | Designing evals, writing assertions, or interpreting eval results. |
| references/skill-anatomy.md | Understanding the structure and purpose of each file in a skill package. |
| references/routing-integration.md | Integrating a skill into the routing matrix and rule files. |
tools
Use when investigating latest vendor behavior, comparing tools or platforms, verifying claims beyond the repo, or gathering external evidence before implementation.
documentation
Use when designing database schemas, normalization strategies, indexing plans, query optimization, and migration workflows for relational, document, or hybrid data stores.
development
Use when writing, reviewing, or refactoring modern C#/.NET code, including minimal APIs, records, async streams, pattern matching, DI lifetimes, and memory-efficient performance tuning.
development
Use when conducting code reviews, building review checklists, calibrating review depth, providing structured feedback, or establishing team review practices. Covers review methodology, feedback patterns, automated checks, and batch review strategies.