skills/skill-maker/SKILL.md
Use this skill any time someone wants to create, scaffold, build, fix, improve, benchmark, or optimize a Tessl/Claude skill — even if they don't say 'tessl' explicitly. If the request involves making a new skill ('create a skill for X', 'build me a skill that does Y', 'scaffold a skill called Z'), fixing or completing an existing one (missing tile.json, broken repo integration, low eval scores, description not triggering), or running and iterating on evals, invoke this skill. The full workflow covers: structured interview → SKILL.md + tile.json + rules/ scaffolding → README/CI repo integration → tessl tile lint → optional Tessl CLI pipeline (skill review, scenario generate/download, eval run) → hand-authored evals or LLM-as-judge fallback → benchmark logging. Do NOT use for: editing application code, debugging, refactoring, writing general documentation, or creating presentations.
npx skillsauth add kvokov/oh-my-ai skill-makerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Create production-quality Tessl skills from scratch and optimize them through eval-driven iteration.
Two modes of operation:
Detect which mode from the user's request. If ambiguous, ask.
tessl tile lint if available; otherwise run simulated lint checks (Phase 2.5).skills-lock.json — it pins vendored skills under .agents/skills/; first-party tiles live under skills/ and are wired via README + CI only.tessl eval run or LLM-as-judge runs. All skill mutations (e.g. tessl skill review, Phase 5 apply) must occur before eval execution starts, at explicit workflow boundaries — never interleaved with a running eval.tessl skill review --optimize --yes is an exception: Tessl applies changes immediately without per-edit approval — treat it as a distinct "Tessl apply" step and tell the user when you run it.metadata.version. Use a semver string (e.g. 1.0.0 for new skills; bump minor or patch when behavior or documentation meaningfully changes).Run all 10 questions using AskUserQuestion before generating any files. Collect answers into a working decision map held in memory. Complete the Completeness check at the end before proceeding.
| # | Question | Key options | If unsure | |---|----------|-------------|-----------| | 1 | What does this skill do? (one sentence) | Free text | Ask "What task should the AI do better?" and "What goes wrong without it?" | | 2 | Who will use this skill? | Developers / Semi-technical / Both | Default to Both | | 3 | What type of project? | Code generation / Writing / Tool use / Interview / Other | Ask for a brief domain description | | 4 | What are the 3–5 things this skill MUST do every time? | Free text (list) | Ask "What would make you say 'it worked perfectly'?" | | 5 | What should this skill NEVER do? | Free text (list) | Generate domain-specific anti-patterns from purpose + domain answers | | 6 | What phrases or signals activate this skill? | Free text / Generate suggestions / Research similar | Produce ≥5 candidate trigger terms from purpose + domain + behaviors; present for approval | | 7 | What does the final output look like? | Files / Structured message / Interactive flow | Research similar skills | | 8 | Does this skill need companion files beyond SKILL.md? | No / Rules files / Templates | Recommend companion files if >5 core behaviors or estimated length >300 lines | | 9 | Which tools does this skill need? | AskUserQuestion only / + file tools / + WebSearch / All + Bash | Infer from domain: Code-gen → file tools + Bash; Writing → file tools; Workflow → all; Interview → AskUserQuestion + optionally WebSearch | | 10 | Describe 2–3 realistic test tasks for this skill | Free text / Generate / Skip | Generate from purpose + behaviors |
Completeness check: Before scaffolding, verify all 10 categories have resolved values. If any are missing or still "unsure," resolve them before continuing.
Turn the decision map into a complete skill directory. See scaffold-rules for full implementation details.
Frontmatter:
---
name: <skill-name>
description: "<Purpose>. Triggers: <trigger terms>. Uses <tools>. Outputs: <deliverables>. Do NOT use for: <exclusions>."
metadata:
version: "1.0.0"
tags: <domain tags>
---
Apply activation-design heuristics: front-load trigger terms; use imperatives throughout ("Use X", "Do not Y", "Always Z") — never "consider", "may want", or "try to".
Body structure: title + one-liner → non-negotiables (numbered) → process/phases → integrated example (realistic, exercises ≥2 non-negotiables) → anti-patterns.
Length target: 150–400 lines. If content exceeds 400 lines, extract secondary rules into rules/*.md and reference with relative links.
{
"name": "oh-my-ai/<skill-name>",
"version": "1.0.0",
"private": false,
"summary": "<one-line purpose>",
"skills": { "<skill-name>": { "path": "SKILL.md" } }
}
If the interview identified companion file needs — Rules: rules/<rule-name>.md with YAML frontmatter (name, description) and structured content. Templates: at skill root, referenced from SKILL.md with relative links.
Both integrations are mandatory. Check for existing entries before inserting.
| Skill | Description | table..github/workflows/tessl-publish.yml): Append - skills/<skill-name> to the tile: array. Validate YAML after editing (python3 -c "import yaml; yaml.safe_load(open(...))") — revert and report on failure.With tessl CLI: cd skills/<skill-name> && tessl tile lint
Simulated (no CLI): Verify: SKILL.md has valid YAML frontmatter with name, description, and metadata.version (semver string); tile.json has name, version, summary, skills; tile.json name matches oh-my-ai/<skill-name>; no broken relative links; each rules/*.md has YAML frontmatter with name and description.
Report results. Fix failures and re-lint.
Run from repository root with paths like ./skills/<skill-name>. Use which tessl (or equivalent) first; if missing, skip this subsection and use Phase 3 Path M + Phase 4 Path B as needed.
Boundary: tessl skill review writes the skill and must complete before tessl eval run starts (→ non-negotiable #6).
| Step | Command / action |
|------|------------------|
| 1 | tessl skill review --optimize --yes ./skills/<skill-name> — may rewrite SKILL.md (and other files per Tessl). This is Tessl auto-apply (non-negotiable #8). |
| 2 | If the skill has tile.json: cd skills/<skill-name> && tessl tile lint — same as Phase 2.5. |
| 3 | tessl scenario generate ./skills/<skill-name> — parse the generation id from stdout; do not guess. |
| 4 | tessl scenario download <generation> — use the id from step 3. |
| 5 | Place downloaded scenarios under the skill: if Tessl wrote ./evals/ at repo root, move it with mv ./evals/ ./skills/<skill-name>/ (or merge — see below). If output landed elsewhere, move that directory into skills/<skill-name>/evals/. |
| 6 | Continue to Phase 4 — Path A in eval-runner. |
If skills/<skill-name>/evals/ already exists: Use AskUserQuestion before moving: replace entirely, merge (explain how), or download to a temp directory — never overwrite silently.
Local Tessl cache under .tessl/ stays out of git (typically gitignored).
If the skill has no evals/ directory, or the user asks for eval scenarios, offer to create them via AskUserQuestion.
Path T — Tessl CLI (preferred): Run steps 3–5 from §2.6: tessl scenario generate → parse generation id → tessl scenario download → place under skills/<skill-name>/evals/. To tune the skill before generating scenarios, run steps 1–2 from §2.6 first. After download, verify coverage against benchmark-loop; add or adjust scenarios by hand if gaps remain.
Path M — Manual (fallback): Use when Tessl is missing, the user declines CLI generation, or download fails. Author scenarios directly.
Generate 2–3 scenarios (or validate CLI output) following benchmark-loop coverage rules (full scenario schema, scoring rules, and selection heuristics are defined there). For each scenario, ensure evals/<scenario-slug>/ contains:
task.md — A realistic problem (100–300 words) reflecting actual user prompts. Not a toy example.
criteria.json:
{
"context": "Tests whether <specific capability>",
"type": "weighted_checklist",
"checklist": [
{ "name": "<criterion>", "max_score": N, "description": "<what to check>" }
]
}
Key constraints: all max_score values must sum to exactly 100; each criterion must be independently verifiable. Name scenarios as kebab-case slugs (e.g., core-interview-flow, noisy-context-retrieval).
See eval-runner for full implementation (including the full Tessl CLI pipeline and --json vs --agent). Summary:
Path A — Tessl CLI (preferred): From repo root, tessl eval run ./skills/<skill-name> --json (add --agent=... when you need a fixed judge model; see eval-runner). Parse JSON output into per-scenario, per-criterion scores.
Path B — LLM-as-Judge Fallback: For each scenario, run two subagents (Agent tool) — one with task only (baseline), one with SKILL.md prepended (with-skill). Score each criterion by launching a judge subagent with the criterion description and agent output; request a JSON response {"score": N, "reasoning": "..."}.
Assemble results into a unified schema: date, method, model, scenarios (each with baseline score, with-skill score, delta, and per-criterion breakdown).
Calibration: If both paths are available, run both on the same scenarios. Accept if within ±15%; otherwise flag to user and prefer CLI results.
Analyze eval results, classify failures, and propose targeted edits. See activation-design and benchmark-loop for full failure pattern definitions and classification guidance.
| Pattern | Signal | Fix | |---------|--------|-----| | Activation gap | Skill didn't fire / agent ignored instructions | Add explicit triggers to description; front-load non-negotiables | | Ambiguous instruction | Inconsistent behavior across runs | Replace "consider"/"may want" with imperatives | | Missing example | Agent doesn't know expected output shape | Add integrated example showing input → decision points → output | | Regression | Negative delta vs. baseline | Identify which edit caused it; revert or rewrite | | Context overload | Skill too long, agent loses focus | Compress; extract rules to companion files |
On user approval: apply edits → re-run Phase 4 → compare new vs. previous results → log to benchmark-log.md → flag any negative deltas immediately.
Optional Tessl loop: Before Phase 4, re-run §2.6 from step 1 (tessl skill review through scenario refresh) to regenerate scenarios after major skill changes. All such mutations must finish before eval execution begins (→ non-negotiable #6).
After every eval run, append to skills/<skill-name>/benchmark-log.md:
## Run: <ISO-8601 timestamp>
**Method:** <tessl-cli | llm-as-judge> | **Model:** <model-name>
| Scenario | Baseline | With Skill | Delta |
|----------|----------|------------|-------|
| <name> | <score> | <score> | <+/-N> |
**Changes applied:** <summary of edits, or "Initial evaluation">
---
Create the file if it doesn't exist. Always append — never overwrite.
Warnings do not block. If warnings exist, offer to run another optimization cycle (return to Phase 5).
User says: "Create a skill for writing git commit messages"
Interview summary → decision map:
| # | Answer |
|---|--------|
| 1 | "Generate conventional commit messages from staged diffs" |
| 2 | Developers |
| 3 | Code generation |
| 4 | Read staged diff; use Conventional Commits format; keep subject ≤72 chars; include body for non-trivial changes |
| 5 | Never fabricate changes not in the diff; never use vague subjects like "update code" |
| 6 | "commit message, write commit, git commit, conventional commit" |
| 7 | Structured message |
| 8 | No companion files |
| 9 | Bash (for git diff --staged) |
| 10 | Generate scenarios |
Scaffold produced:
skills/commit-message/
├── SKILL.md # Frontmatter with triggers, non-negotiables, format rules, integrated example
├── tile.json # oh-my-ai/commit-message, v1.0.0
└── evals/
├── simple-feature-commit/
│ ├── task.md # "Given this staged diff adding a login form..."
│ └── criteria.json # Tests: conventional format, subject length, body presence
└── noisy-multi-file-commit/
├── task.md # "Given this large diff touching 8 files..."
└── criteria.json # Tests: focus, not fabricating, correct scope
Repo integration: README row added (alphabetically); CI matrix updated.
tessl eval run / judge runs) or overwriting previous benchmark-log.md entries.tessl skill review in the middle of Phase 4, or guessing a scenario generation id instead of parsing CLI output.skills-lock.json when scaffolding a first-party skill under skills/.tools
NestJS (Nest.js) production patterns for modules, controllers, providers, guards, interceptors, pipes, middleware, JWT, ValidationPipe, microservices, GraphQL, Bull queues, Prisma, and TypeORM. Triggers: NestJS, Nest.js, Nest module, dependency injection, class-validator DTO, exception filter, testing module, GraphQL resolver, Bull queue, microservice client. Uses: Read, Grep, Glob, Bash, WebSearch. Outputs: tier-ordered review checklists and/or concrete code edits with cited rule filenames. Do NOT use for: non-Nest backends (Express/Fastify only with no Nest integration), frontend-only frameworks, generating AGENTS.md, or toolchain setup unrelated to Nest.
development
Professional UI/UX design skill for React, Next.js, Tailwind CSS, React Native, and Flutter. Use when the user asks to create or polish UI components (modals, forms, tables, charts, navbars, sidebars, cards), design landing pages, build dashboards or admin panels, set up SaaS or mobile app screens, review or fix layout and accessibility issues, configure dark mode or responsive breakpoints, or establish a design system with tokens and component specs. Capabilities include: creating design-system token files and MASTER.md artifacts, generating responsive Tailwind layouts, scaffolding page-level component hierarchies, reviewing and fixing UI accessibility (a11y, WCAG), implementing React Native safe-area screens, and configuring Flutter ThemeData. Outputs design-system files (MASTER, page overrides, tokens, component specs) plus stack-faithful code. Do NOT use for: pure backend-only work with no UI impact, or inventing branding assets you do not have rights to use.
tools
Rigorous thirteen-part synthesis of a text or talk: deep summary, insights, structure, critique, framework rebuild, and CEO-level takeaways. Triggers: reading synthesis, synthesize this, deep dive, rigorous analysis, deconstruct, book analysis, article analysis, essay breakdown, intellectual synthesis, multi-dimensional analysis, executive summary of ideas, framework extraction. Uses: Read (and related file tools) for attached sources; WebSearch or WebFetch when comparands are missing or context is thin. Outputs: single structured markdown message with fixed section headers per rules/output-sections.md.
development
Compiles and maintains a persistent LLM-written markdown wiki between immutable raw sources and answers—the Karpathy LLM Knowledge Base pattern. The agent writes and maintains the wiki; the human curates sources and reads it. Knowledge compounds instead of being re-derived each query. Triggers: llm wiki, persistent wiki, personal knowledge base, wiki maintenance, ingest sources, compound knowledge, index.md, log.md, Obsidian wiki, cross-references, Karpathy wiki pattern, compile wiki, knowledge base. Uses: Read, Glob, Grep, file edits, optional WebSearch, AskUserQuestion when schema or goals are ambiguous. Outputs: updated wiki pages, index, append-only log, citations on query, filed answers, visual outputs. Do NOT use for: mutating raw sources, one-off chat answers with no wiki artifact, or replacing a user-defined wiki schema without reading it first.