skills/golem-powers/skill-creator/SKILL.md
Create, update, and evaluate golem-powers skills. Use for skill evals, with/without benchmarks, live A/B tests, session JSONL mining, batch miners, and handoff digests. Triggers: create skill, new skill, skill eval, benchmark, live eval, A/B test, mine session, mine JSONL, session digest. NOT for invoking an existing skill or convergence weaving.
npx skillsauth add etanhey/golems skill-creatorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Take skills from "drafted" to "proven" through rigorous multi-layer evaluation.
bash ~/Gits/golems/skills/golem-powers/skill-creator/scripts/install.sh
The installer is idempotent — safe to re-run. It:
~/.claude/commands/skill-creator (golem-powers convention)~/Gits/skill-creator/.claude/agents/ and scripts/ exist as the project-scope homesession-miner agent definition into the project-scope agents dir (gates the sub-agent so only skillCreator-family sessions can spawn it)Run bash scripts/install.sh --dry-run to preview without changes. bash scripts/install.sh --help for usage.
Every skill change goes through this pipeline. No exceptions.
brain_search for: prior evals, user complaints, known issues, related skillsDesign eval cases as structured JSON in evals/evals.json:
{
"id": 1,
"name": "descriptive-kebab-name",
"category": "compliance|structure|quality",
"description": "What this eval tests",
"prompt": "The input scenario given to the agent",
"assertions": [
{
"type": "tool_usage|content|negative",
"name": "assertion-name",
"description": "What correct behavior looks like"
}
]
}
Cover: happy path, edge cases, failure modes, interaction with other skills. See references/scoring-rubric.md for scoring methodology.
Two tiers based on skill importance:
live-eval-runner.sh for top 3 discriminator evalsevals/results/live-{date}.json and brain_store'dSee workflows/live-eval.md for the full live eval workflow.
brain_store eval results, delta scores, issues found/writing-skills for SKILL.md structure compliance~/.claude/skills/<name> +
~/.claude/commands/<name> (run golem-install, which auto-discovers + links
every golem-powers/*/ dir; or symlink the one new skill). Then VERIFY it
appears in the available-skills list — committing/merging does NOT register it.
(2026-05-30: /weave was committed + merged but unusable by any agent until the
symlinks existed. See /writing-skills create.md "Final Step: REGISTER".)with_skill vs without_skill comparison is MANDATORY. No exceptions.
| Weight | Dimension | What it measures | |--------|-----------|-----------------| | 70% | Compliance | Does the agent follow the skill's instructions? | | 20% | Structure | Does the output match expected format? | | 10% | Quality | Is the output actually good/useful? |
| Condition | Action | |-----------|--------| | Baseline >70% | Skill may not add value — flag and explain | | Delta <10% | Marginal — consider if complexity is worth it | | Delta >30% | Clearly valuable — ship with confidence | | Compliance <50% with skill | Instructions unclear — rewrite before shipping |
| Skill type | Test with | Why | |------------|-----------|-----| | Claude behavior skills | Sonnet (default) | Tests actual Claude compliance | | Code implementation skills | Codex (default, no model flag) | Tests code quality | | Audit/review skills | Cursor (default) | Tests review thoroughness |
| Workflow | When to use | |----------|-------------| | create-skill.md | Creating a new skill from scratch | | live-eval.md | Running live A/B tests with real agents | | ab-compare.md | Comparing skill versions or platforms | | mine-session.md | Mining a Claude Code session JSONL into a 10-section markdown digest (handoff docs, EOD waves, claim verification) |
| Reference | Content | |-----------|---------| | scoring-rubric.md | Full scoring methodology and rubric | | subagent-vs-skill.md | Classification reference — READ BEFORE SCAFFOLDING any new capability. Distinguishes skill (slash-triggered, parent-context) from sub-agent (name-spawned, isolated-context). Documents the misroute pattern (GitHub openai/codex#18823) that even Codex itself routinely makes. |
/writing-skills — structural templates (SKILL.md format, frontmatter)/cmux-agents — spawning live eval agents in cmux panes/never-fabricate — Read() every file before reporting results/pr-loop — shipping skill changes through the full PR lifecycleThese sub-agents ship in BOTH Claude and Codex formats. They are invokable only from sessions with cwd inside ~/Gits/skill-creator/ (i.e., skillCreatorClaude / skillCreatorCodex / skillCreatorRepoGolem). Sessions in other repos cannot spawn them directly — they must dispatch a skillCreator first.
Canonical source-of-truth lives in this skill's agents/ dir. scripts/install.sh symlinks them into the project repo's project-scope dirs (.claude/agents/ for Claude, .codex/agents/ for Codex). The dual-format packaging means the same sub-agent is available regardless of whether the parent is Claude Code or Codex CLI.
| Sub-agent | Claude format | Codex format | Workflow |
|---|---|---|---|
| session-miner | agents/session-miner.md (model: inherit) | agents/session-miner.toml (model: gpt-5.3-codex-spark) | mine-session.md |
Both formats invoke the same deterministic parser at scripts/session-miner.py. The Claude format uses subagent_type="session-miner" via the Agent tool; the Codex format is referenced by name in natural-language prompts (session_miner, mine X to Y).
For when to choose sub-agent vs skill: see references/subagent-vs-skill.md.
development
Create, edit, and verify golem-powers skills using the standard SKILL.md structure, workflow files, adapters, templates, and eval fixtures. Use for new skills, structural edits, workflows/adapters, and pre-deploy validation. NOT for invoking existing skills, superpowers skills, or skill-creator agent workflows.
testing
Extract structured knowledge from any video source — YouTube URLs or local screen recordings. YouTube → gems workflow (yt-dlp transcript → keyword hotspots → frame extract → brain_digest → structured gems). Screen recordings → QA workflow (reuses /qa-video stalker pipeline). Use when user shares a YouTube link wanting deep extraction with frames, shares a .mov/.mp4 for QA processing, says "extract from video", "video gems", "process this recording", or mentions gem extraction from video content.
testing
Use when running or reviewing any recurring monitor loop for merge queues, worker queues, collab tails, or agent completion. Enforces drive-to-completion ticks: every tick must query live state with `!`, classify whether real progress happened, and then dispatch, verify-and-decrement, or escalate-park. Triggers on: monitor loop, /loop, recurring tick, keep monitoring, silent autonomous, merge gate, blocked review, no-progress loop.
tools
MeHayom freelance client management — daily updates, decision tracking, time logging. Use when drafting Yuval updates, logging scope changes, tracking hours, or any MeHayom client communication. Triggers: 'draft Yuval update', 'client update', 'daily update', 'log decision', 'track time', 'mehayom'.