skills/forgewright/skills/skill-maker/SKILL.md
Creates and improves Forgewright skills through interview, writing, testing, and iteration. Use when user asks to create, improve, or audit skills. Triggers on: "make a skill", "build a skill", "create a skill for...", "improve this skill", "audit skills", "skill quality check". Routed via the production-grade orchestrator.
npx skillsauth add ouakar/web-hosting-ubinarys-dental skill-makerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
!cat skills/_shared/protocols/ux-protocol.md 2>/dev/null || true
!cat skills/_shared/protocols/input-validation.md 2>/dev/null || true
!cat skills/_shared/protocols/tool-efficiency.md 2>/dev/null || true
!cat .production-grade.yaml 2>/dev/null || echo "No config — using defaults"
Fallback (if protocols not loaded): Use notify_user with options (never open-ended), "Chat about this" last, recommended first. Work continuously. Print progress constantly. Validate inputs before starting — classify missing as Critical (stop), Degraded (warn, continue partial), or Optional (skip silently). Use parallel tool calls for independent reads. Use view_file_outline before full Read.
End-to-end skill creation and improvement pipeline. Interviews the user, writes the SKILL.md, tests it against real prompts, iterates based on feedback, and installs. Also supports auditing existing skills for quality.
Phase 1: Interview → Phase 2: Write SKILL.md → Phase 3: Test Cases
→ Phase 4: Evaluate & Iterate → Phase 5: Install → Phase 6: Audit (optional)
Ask 3-4 questions using notify_user, one at a time:
If the current conversation already contains a workflow the user wants to capture, extract answers from the conversation history first — the tools used, the sequence of steps, corrections the user made. The user may need to fill gaps, but don't re-ask what's already obvious.
Explain WHY, not just WHAT.
Today's LLMs are smart. They have good theory of mind and when given a good harness can go beyond rote instructions. Rather than writing oppressively constrictive MUSTs and NEVERs, explain the reasoning so the model understands why something matters. This produces more robust, generalizable behavior.
Yellow flags in your writing:
MUST, ALWAYS, NEVER in ALL CAPS — reframe as explanationsGood vs Bad:
❌ BAD: "You MUST ALWAYS use try-catch. NEVER use raw promises."
✅ GOOD: "Wrap async operations in try-catch because unhandled rejections
crash Node.js silently. Raw promises lose stack traces, making
debugging painful."
skill-name/
├── SKILL.md (required)
│ ├── YAML frontmatter (name, description required)
│ └── Markdown instructions
└── Bundled Resources (optional)
├── scripts/ - Executable code for deterministic tasks
├── references/ - Docs loaded into context as needed
└── phases/ - Sub-phases for complex multi-step skills
Skills use a three-level loading system to manage context efficiently:
If SKILL.md approaches 500 lines, add hierarchy: move detailed content to references/ or phases/ with clear pointers about when to read them.
---
name: skill-name # kebab-case, letters/numbers/hyphens only
description: >
Use when [triggering conditions]. Include specific trigger phrases,
contexts, and adjacent topics. Be slightly "pushy" — the model tends
to under-trigger, so err toward broader matching.
---
Description is the primary trigger mechanism. It determines whether the model invokes the skill. Include:
---
name: skill-name
description: Use when [triggering conditions]
---
# Skill Name
## Protocols
[Protocol injection lines — cat shared protocols]
## Overview
Core principle in 1-2 sentences. Why this skill exists.
## When to Use
Bullet list with symptoms and use cases.
## Process Flow (if multi-step)
Small inline flowchart for non-obvious decisions.
## [Core Content]
Steps, patterns, or reference material.
Explain WHY each step matters, not just WHAT to do.
## Common Mistakes
Table of mistake → fix pairs.
Present the SKILL.md to the user and ask for approval using notify_user before proceeding.
After writing the skill draft, create 2-3 realistic test prompts — the kind of thing a real user would actually say.
{
"skill_name": "example-skill",
"test_cases": [
{
"id": 1,
"prompt": "User's task prompt — realistic, not contrived",
"expected_behavior": "Description of what should happen",
"success_criteria": [
"Criterion 1 — objectively verifiable",
"Criterion 2 — objectively verifiable"
]
}
]
}
Good test cases are:
Share them with the user: "Here are a few test cases I'd like to try. Do these look right, or do you want to add/change any?"
Save test cases to skills/<skill-name>/evals/test-cases.json.
For each test case, mentally walk through the skill with the prompt:
Generalize from feedback. Skills will be used across many different prompts. If you're fixing something for a specific test case, make sure the fix generalizes. Rather than adding fiddly overfitting changes, try different metaphors or patterns.
Keep the skill lean. Remove things that aren't pulling their weight. If instructions are making the model waste time on unproductive steps, cut them.
Explain the why. Try hard to explain the reasoning behind every instruction. If you find yourself writing ALWAYS or NEVER in all caps, that's a yellow flag — reframe and explain the reasoning instead.
Look for repeated work. If every test case would result in the model writing the same utility script or taking the same setup steps, bundle that script in scripts/ or add it to the skill instructions. Save every future invocation from reinventing the wheel.
Write/Update SKILL.md
→ Run test cases (mental walkthrough or actual execution)
→ Review results against success criteria
→ Identify issues: WHY did it fail?
→ Apply improvements (generalize, don't overfit)
→ Repeat until satisfied (usually 2-3 iterations)
Place the generated SKILL.md directly in the project's skills directory:
skills/
└── <skill-name>/
├── SKILL.md
├── evals/
│ └── test-cases.json
└── references/ (optional)
└── detailed-docs.md
The skill is immediately available for use — Antigravity loads skills directly from the skills/ directory. No packaging or marketplace registration needed.
mkdir -p skills/<skill-name>/skills/<skill-name>/SKILL.mdskills/<skill-name>/phases/*.md✓ Skill "<skill-name>" installed to skills/<skill-name>/SKILL.mdIf the user wants to share the skill publicly:
git init in the skill directorygit add -A && git commit -m "Initial release: <skill-name> skill v1.0.0"gh repo create <skill-name>-skill --public --source . --pushgh auth fails, guide user through gh auth loginWhen asked to audit existing skills, apply this checklist to each skill:
| # | Check | Score | |---|-------|-------| | 1 | Description triggers correctly — includes trigger phrases, contexts, adjacent topics | 0-2 | | 2 | Explains WHY — rules have rationale, not just prescriptive MUSTs | 0-2 | | 3 | Progressive disclosure — under 500 lines, references/ for overflow | 0-2 | | 4 | Practical examples — at least 1 real-world example, not contrived | 0-2 | | 5 | Lean instructions — no bloat, every paragraph earns its place | 0-2 | | 6 | Consistent structure — follows the template (Overview, When, Flow, Content, Mistakes) | 0-1 | | 7 | Keywords for discoverability — error messages, symptoms, tool names in content | 0-1 |
Score guide: 0 = missing/poor, 1 = adequate, 2 = excellent
Grade:
For each skill audited, produce:
━━━ Skill Audit: <skill-name> ━━━━━━━━━━━━━━━━━
Score: 8/12 (B)
Issues:
⚠ Description too narrow — missing adjacent triggers
⚠ Phase 3 rules lack rationale (WHY)
Fixes applied:
✓ Expanded description with trigger keywords
✓ Added "because..." to 3 rules in Phase 3
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
| Issue | Impact | Fix Pattern | |-------|--------|-------------| | MUST/ALWAYS/NEVER overuse | Model becomes rigid, fails on edge cases | Replace with "because..." rationale | | Description too narrow | Under-triggering, skill never activates | Add adjacent topics, synonyms, user phrases | | No examples | Model guesses format, inconsistent output | Add 1-2 concrete input→output examples | | 500+ lines, no hierarchy | Dilutes important instructions | Move detail to references/, keep core lean | | Rules without WHY | Model follows letter but misses spirit | Add reasoning after each significant instruction | | Duplicate content | Wastes context window | Consolidate into shared protocol or reference |
| Mistake | Fix |
|---------|-----|
| Description summarizes workflow | Description = triggering conditions ONLY. "Use when..." |
| Special chars in name | Letters, numbers, hyphens only. No parentheses. |
| Skill too verbose (500+ lines) | Cut ruthlessly. Move to references/. One example, not five. |
| Missing keywords for discovery | Add error messages, symptoms, tool names in the content |
| Not placing in skills/ directory | Skills go in skills/<name>/SKILL.md for auto-loading |
| Overfitting to test cases | Generalize patterns, explain WHY, not just WHAT |
| All caps instructions (MUST/NEVER) | Yellow flag — reframe as explanations with rationale |
development
[production-grade internal] Builds AR/VR/MR applications — spatial UI/UX, hand tracking, gaze input, controller interaction, comfort optimization, and cross-platform XR (Quest, Vision Pro, WebXR, PCVR). Routed via the production-grade orchestrator (Game Build mode).
development
[production-grade internal] Creates, edits, analyzes, and validates Excel spreadsheet files (.xlsx, .csv, .tsv). Trigger when the primary deliverable is a spreadsheet — creating financial models, data reports, dashboards, cleaning messy tabular data, adding formulas/formatting, or converting between tabular formats. Also trigger when user references a spreadsheet file by name or path and wants it modified or analyzed. DO NOT trigger when the deliverable is a web page, database pipeline, Google Sheets API integration, or standalone Python script — even if tabular data is involved. Routed via the production-grade orchestrator (Feature/Custom mode).
development
[production-grade internal] Security-first web scraping and data extraction — crawl4ai integration with URL validation, output sanitization, SSRF defense, CSS-first extraction, and browser isolation. Library-only mode (no Docker API). Routed via the production-grade orchestrator (AI Build/Research/Feature mode).
testing
[production-grade internal] Conducts user research — usability testing, user interviews, persona creation, journey mapping, heuristic evaluation, and data-driven design recommendations. Routed via the production-grade orchestrator (Design mode).