skills/general-skill-creator/SKILL.md
Create, modify, evaluate, and package portable Agent Skills for coding agents. Use this skill when users want to create a skill from scratch, turn a workflow into a reusable skill, improve an existing skill, set up evals, compare skill behavior against a baseline, or optimize skill trigger descriptions across agent environments.
npx skillsauth add jtsang4/efficient-coding general-skill-creatorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A skill for creating portable Agent Skills and improving them through realistic test runs.
Agent Skills are reusable instruction packages for coding agents. A portable skill should rely on the shared skill pattern: a SKILL.md file with frontmatter, concise instructions, and optional bundled resources such as scripts/, references/, assets/, and evals/.
Work through this loop:
Use the current coding agent's native capabilities for execution. If the environment has subagents, use them for independent runs. If the environment has headless execution, use it for repeatable test runs. Headless execution means a non-interactive agent run launched by CLI, SDK, API, or task runner, where the prompt, inputs, workspace, and output path are provided up front.
Match the user's level of technical detail. Explain terms such as "eval", "assertion", "headless", and "baseline" when the user seems new to agent workflows.
Keep the user involved at the decisions that shape portability:
Start by extracting answers from the current conversation when available. Ask for missing details only when they change the skill design.
Capture these details:
For installation, prefer early confirmation. If the user has no preference, recommend a canonical repository-local skill directory when the workspace already has one. If the workspace has no convention, suggest skills/<skill-name>/ for source control and let the active coding agent handle any environment-specific installation or discovery path.
Create or edit a directory with this shape:
skill-name/
├── SKILL.md
├── scripts/
├── references/
├── assets/
└── evals/
Only create optional directories that the skill actually needs.
SKILL.md should contain:
name: kebab-case identifier.description: trigger guidance written for skill discovery.compatibility: optional note for required tools, runtimes, credentials, or environment assumptions.references/, when to run scripts, and how to use assets.The description is the main discovery signal. Write it around user intent, task context, and trigger phrases. Include nearby cases where the skill should trigger even when the user omits the exact skill name.
Good descriptions answer:
Keep the description specific enough to beat nearby skills in selection and broad enough to catch natural user phrasing.
Use progressive disclosure:
SKILL.md gives the core workflow.Keep SKILL.md readable. Move long reference material into references/ and point to it from the relevant section. Put deterministic or repetitive work into scripts/ so future agent runs can reuse it.
Write skills for the current agent and future agents:
Create skills whose contents match the user's stated intent. Refuse requests for skills that enable unauthorized access, credential theft, malware, covert data extraction, or deception. For sensitive workflows, include explicit user confirmation steps and clear handling of credentials or private data.
After drafting the skill, create 2-3 realistic test prompts. Use real-world phrasing with enough context to exercise the skill.
Save test cases to evals/evals.json:
{
"skill_name": "example-skill",
"evals": [
{
"id": 1,
"name": "descriptive-case-name",
"prompt": "User's task prompt",
"expected_output": "Description of expected result",
"files": [],
"expectations": []
}
]
}
Draft expectations while runs are in progress. Good expectations are objectively checkable and specific to outcomes the user cares about. Subjective skills can use human review as the primary signal.
See references/schemas.md when you need the shared JSON shapes for eval metadata, grading, benchmark data, feedback, and telemetry.
Run each test in at least two configurations:
with_skill: the task is executed with the draft skill available and explicitly in scope.baseline: the same task is executed without the draft skill for new skills, or with the previous skill version for improvements.Use independent execution when available:
Put results in a workspace next to the skill directory:
<skill-name>-workspace/
└── iteration-1/
└── eval-1-descriptive-case-name/
├── eval_metadata.json
├── with_skill/
│ └── run-1/
│ ├── outputs/
│ ├── transcript.md
│ ├── grading.json
│ └── timing.json
└── baseline/
└── run-1/
├── outputs/
├── transcript.md
├── grading.json
└── timing.json
For each run, instruct the executor to:
Execute this task:
- Skill: <path or installed skill name for the with_skill run; "none" for baseline>
- Task: <eval prompt>
- Input files: <files, or "none">
- Save outputs to: <iteration-dir>/<eval-dir>/<configuration>/run-1/outputs/
- Save a brief transcript to: <iteration-dir>/<eval-dir>/<configuration>/run-1/transcript.md
- Save any available telemetry to: <iteration-dir>/<eval-dir>/<configuration>/run-1/timing.json
Telemetry is useful for cost and stability analysis. Capture what the runtime exposes:
{
"total_tokens": 84852,
"duration_ms": 23332,
"total_duration_seconds": 23.3,
"tool_calls": 18,
"notes": "Fields are optional and depend on the active agent runtime."
}
Grade each run against the expectations in the eval metadata. Use a grader subagent when available; otherwise grade inline. Programmatic checks are preferred for deterministic outputs.
Use agents/grader.md as the grading prompt when running a grader subagent or grading inline.
Save grading.json in each run directory:
{
"expectations": [
{
"text": "The output includes a valid CSV with a header row",
"passed": true,
"evidence": "outputs/report.csv has headers: date, amount, category"
}
],
"summary": {
"passed": 1,
"failed": 0,
"total": 1,
"pass_rate": 1.0
}
}
After grading, aggregate results into a benchmark summary with pass rates, qualitative notes, and any available telemetry. Keep with_skill and baseline side by side so the user can see the skill's actual contribution.
Use the bundled aggregation script when the workspace follows the layout above:
cd <general-skill-creator-path>
python -m scripts.aggregate_benchmark <skill-workspace>/iteration-1 --skill-name <skill-name>
Ask an analyzer subagent to read agents/analyzer.md when you want a deeper pass over benchmark patterns, flaky assertions, and resource-usage outliers.
Show the user the outputs before revising the skill. Use whatever presentation mechanism the current environment supports:
Prefer the bundled review viewer when a local filesystem and Python are available:
python <general-skill-creator-path>/eval-viewer/generate_review.py \
<skill-workspace>/iteration-1 \
--skill-name "<skill-name>" \
--benchmark <skill-workspace>/iteration-1/benchmark.json
For headless environments, use the viewer's --static <output_path> option and share the generated HTML file through the environment's normal artifact mechanism.
Give the user a simple review structure:
Save feedback as feedback.json when possible:
{
"reviews": [
{
"run_id": "invoice-summary-with_skill",
"feedback": "The totals are correct, but the summary should mention overdue invoices.",
"timestamp": "2026-01-15T10:30:00Z"
}
],
"status": "complete"
}
Empty feedback means the output was acceptable.
Use feedback to improve general behavior across future tasks:
SKILL.md.Favor durable workflow guidance over overfitting to a single eval prompt. Explain why each important instruction matters so future agents can adapt when details vary.
Optimize the description after the skill works well on task quality.
Create a trigger eval set with 16-20 realistic queries:
[
{
"query": "the user prompt",
"should_trigger": true,
"reason": "Why this request needs the skill"
},
{
"query": "near-miss prompt",
"should_trigger": false,
"reason": "Why another workflow fits better"
}
]
Use a mix of:
Run trigger evaluation with the current agent's native skill discovery behavior when available. A strong trigger test installs or exposes the skill the same way real users will use it, sends realistic prompts, and records whether the agent consulted the skill.
If the runtime exposes no trigger trace, use a proxy signal:
Apply the best description after reviewing train and held-out results. Show the user the before/after description and the trigger accuracy.
Use assets/eval_review.html to let the user review and edit trigger queries. Replace these placeholders:
__EVAL_DATA_PLACEHOLDER__: JSON array of trigger eval items.__SKILL_NAME_PLACEHOLDER__: the skill name.__SKILL_DESCRIPTION_PLACEHOLDER__: the current description.When trigger runs are complete, save a result file with rows like:
[
{
"query": "User-like prompt",
"triggered": true,
"evidence": "Transcript shows the agent loaded SKILL.md"
}
]
Then score the results:
cd <general-skill-creator-path>
python -m scripts.score_trigger_results \
--eval-set <trigger-evals.json> \
--results <trigger-results.json> \
--description "<description being tested>" \
--output <trigger-score.json>
python -m scripts.generate_report <trigger-score.json> --skill-name <skill-name> -o <trigger-report.html>
The scoring script evaluates trigger decisions only. The active coding agent remains responsible for running prompts through its native skill discovery path.
Generate a description-improvement prompt when trigger failures show clear patterns:
cd <general-skill-creator-path>
python -m scripts.improve_description \
--skill-path <path/to/skill-folder> \
--trigger-score <trigger-score.json> \
--output <description-improvement-prompt.md>
Confirm installation target before packaging or copying files:
When packaging, use a portable archive or the agent's standard skill package format if one exists. Include SKILL.md and needed bundled resources. Exclude eval workspaces, temporary files, generated benchmarks, local credentials, caches, and dependency folders.
Use scripts/package_skill.py for a portable .skill archive:
cd <general-skill-creator-path>
python -m scripts.package_skill <path/to/skill-folder> <output-directory>
After installation, run a small smoke test in the target environment to verify the skill is discoverable and usable.
When improving an existing skill:
Use these bundled resources:
agents/grader.md: grade expectations against transcripts and outputs.agents/comparator.md: compare two outputs blindly.agents/analyzer.md: analyze benchmark patterns or explain why one version beat another.assets/eval_review.html: review and edit trigger eval queries with the user.eval-viewer/generate_review.py: generate or serve a human review UI for eval outputs.scripts/aggregate_benchmark.py: aggregate grading.json files into benchmark.json and benchmark.md.scripts/generate_report.py: render description trigger optimization reports.scripts/score_trigger_results.py: score trigger eval results collected from any coding agent.scripts/run_eval.py: compatibility wrapper around generic trigger result scoring.scripts/improve_description.py: generate a portable prompt for improving a trigger description.scripts/quick_validate.py: validate SKILL.md frontmatter and basic skill shape.scripts/package_skill.py: package a skill directory into a portable archive.references/schemas.md: JSON structures for evals, grading, benchmarks, feedback, trigger evals, and telemetry.Before handing the skill back:
SKILL.md has valid frontmatter and a clear description.testing
Create professional SVG diagrams of any type — architecture diagrams, flowcharts, sequence diagrams, structural diagrams, mind maps, timelines, illustrative/conceptual diagrams, and more. Supports light (default) and dark themes. Use this skill whenever the user asks for any kind of technical or conceptual diagram, visualization of a system, process flow, data flow, component relationship, network topology, decision tree, org chart, state machine, or any visual representation of structure/logic/process. Also trigger when the user says "画个图" "画一个架构图" "diagram" "flowchart" "sequence diagram" "draw me a ..." or uploads content and asks to visualize it. Output is always a standalone .svg file.
development
Use when you have a spec or requirements for a multi-step task, before touching code
development
Manage Git worktrees. Use when asked to create/switch/list/merge/remove worktrees, to keep multiple branches in parallel directories, or to clean up worktrees safely during development.
development
Use when implementing any feature or bugfix, before writing implementation code