skills/asset-enhancer/SKILL.md
Classify a software-asset brief (logo, app icon, favicon, OG image, illustration, splash, icon pack, transparent mark), route to the right image model, rewrite the prompt in the target model's dialect, pick an execution mode (inline_svg / external_prompt_only / api) based on what's actually available, and run the pipeline. Use whenever the user asks for any visual asset for a software product.
npx skillsauth add MohamedAbdallah-14/prompt-to-asset asset-enhancerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Comprehensive behavior spec for producing production-grade software-development assets. Drives every asset_* MCP tool.
- Producing production-grade software assets is a routing + post-processing problem, not a prompt-engineering problem.
- The user may not have an image-model API key. The plugin must work anyway.
- There are real zero-key and free-token paths — always surface them first before asking the user to pay for anything. The
free_api.routesblock inasset_capabilities()enumerates the ranked free programmatic routes: Cloudflare Workers AI (Flux-1-Schnell + SDXL, 10k neurons/day), NVIDIA NIM (Flux / SDXL / SANA, 1k requests/month withNVIDIA_API_KEY, no card), HF Inference (freeHF_TOKEN), Stable Horde (anonymous queue), Pollinations (zero-signup HTTP, last resort), and local ComfyUI. Google's Gemini / Imagen image API is paid for programmatic image output — unbilledGEMINI_API_KEY→ HTTP 429,limit: 0. The AI Studio web UI at https://aistudio.google.com is still free for interactive generation — treat it as a paste-only flow (external_prompt_only→asset_ingest_external).
See rules/asset-enhancer-activate.md for the condensed always-on version. This file is the long-form spec.
Before generating anything, decide (with the user) which mode to use. Call asset_capabilities() to learn what's available in the current environment.
inline_svg — zero key, Claude writes the SVG, server persistsTwo-step round trip:
svg_brief (viewBox, palette, path budget, style hints, skeleton) and instructions_to_host_llm. You emit the <svg>…</svg> inline in your reply as a ```svg code block so the user can see it.asset_save_inline_svg({ svg, asset_type }). The server validates the SVG against the brief (viewBox, <path> count, palette), writes master.svg to disk, and — for favicon / app_icon — runs the full platform export (favicon.ico + apple-touch-icon + PWA 192/512/512-maskable; or AppIconSet + Android adaptive + PWA). Returns an AssetBundle with variants[].path. Show those paths to the user.Skip step 2 and the user gets a code block with no file. Do not skip step 2.
Good for: logo, favicon, icon_pack, sticker, transparent_mark, simple app_icon masters. Flat, geometric, ≤40 paths.
Not good for: illustrations, hero art, photoreal, OG cards with web-font typography.
external_prompt_only — zero key, user pastes into a web UIThe server returns the dialect-correct prompt + a list of paste targets with URLs in free-first order: AI Studio web UI (https://aistudio.google.com — free interactive Gemini / Imagen), Pollinations (zero-signup HTTP), HF Inference (free HF_TOKEN), Stable Horde (anonymous queue), then paid UIs (Ideogram web, Recraft web, Midjourney, fal.ai Flux, BFL Playground, OpenAI Platform Playground, etc.). The user generates elsewhere, saves the image locally, and calls asset_ingest_external({ image_path, asset_type }) to run the matte / vectorize / validate pipeline.
Note: GEMINI_API_KEY does NOT give free image-gen — Google removed that tier in December 2025. The free Gemini/Imagen route is the AI Studio web UI, not the API.
Good for: any asset type. Best for illustration, hero, text-heavy logos, when the user has a Midjourney / Ideogram subscription but no API key.
api — requires a provider keyServer calls the routed provider directly, mattes, vectorizes, exports, validates. Writes a content-addressed AssetBundle with variants[].path for every artifact.
Requires at least one of: OPENAI_API_KEY, IDEOGRAM_API_KEY, RECRAFT_API_KEY, BFL_API_KEY, GEMINI_API_KEY (or GOOGLE_API_KEY). The router falls back among configured providers automatically.
user brief
↓ asset_capabilities() → what's available RIGHT NOW
↓ asset_enhance_prompt({ brief }) → AssetSpec + modes_available
↓ ask the user which mode they want
↓ asset_generate_<type>({ brief, mode }) → InlineSvgPlan | ExternalPromptPlan | AssetBundle
↓ if inline_svg:
1. emit <svg> in reply (user sees the shape)
2. call asset_save_inline_svg({ svg, asset_type })
3. show the returned variants[].path list (user sees files on disk)
↓ if external: show prompt + paste_targets; wait for asset_ingest_external
↓ if api: show variants[].path and validation warnings
| asset_type | Transparency default | Vector? | Text? | inline_svg? | external_prompt_only? | api? |
|---|---|---|---|---|---|---|
| logo | yes (RGBA PNG and SVG) | preferred | wordmark optional (composite) | ✅ | ✅ | ✅ |
| app_icon | no on iOS 1024 marketing | no | no | ✅ master only | ✅ master only | ✅ full fan-out |
| favicon | mixed | prefer | rare | ✅ | ✅ | ✅ |
| og_image | no | no | yes (headline) | ❌ (web-font layout beyond LLM reach) | ✅ bg only | ✅ |
| splash_screen | vector icon + solid bg | no | no | ❌ | ✅ | ✅ |
| illustration | often | SVG where geometry allows | avoid | ❌ path budget | ✅ | ✅ |
| icon_pack | yes | mandatory SVG | no | ✅ | ✅ | ✅ |
| hero | optional | no | sometimes | ❌ | ✅ | ✅ |
| sticker | yes | no | rare | ✅ | ✅ | ✅ |
| transparent_mark | yes | no | avoid | ✅ | ✅ | ✅ |
brief
↓ classify() → asset_type ∈ enum
↓ parse_brand_bundle()? → BrandBundle | null
↓ compute_params(asset_type, brand) → dimensions, transparency, safe_zone
↓ route(asset_type, ...) → primary_model, fallback_model, postprocess[]
↓ rewrite(brief, model, brand) → dialect-appropriate final prompt
── call provider with seed pinned ──
↓ validate_0(image) → reject checkerboard / bad alpha / moderation
↓ matte? (birefnet | rmbg via PROMPT_TO_BUNDLE_RMBG_URL | difference fallback | native)
↓ vectorize? (recraft | vtracer-on-PATH | potrace-on-PATH | llm-svg | posterize fallback)
↓ upscale? (dat2 | supir | img2img | never)
↓ export (sharp platform fan-out + @resvg/resvg-js + satori + png-to-ico + svgo)
↓ validate_final (tile-luma alpha check + bbox + palette ΔE2000 + OCR Levenshtein + WCAG contrast)
↓ content-address cache (prompt_hash, model, seed, params_hash)
→ AssetBundle
The routing table lives in data/routing-table.json. Summary:
| Need | Primary | Fallback | Never |
|---|---|---|---|
| Transparent PNG mark | gpt-image-1 (background:"transparent") | Ideogram 3 Turbo style:"transparent" → Recraft V3 | Imagen any, Gemini any, SD 1.5 |
| Logo with 1–3 word text | Ideogram 3 → gpt-image-1.5 → Recraft V4 | Composite real SVG type over mark | Imagen 4, SD 1.5, Flux Schnell |
| Logo with 4–10 word text | flux-2 / gpt-image-2 / Nano Banana Pro | gpt-image-1.5 | SD, Flux Schnell, Pollinations |
| Logo with paragraph-length text | Nano Banana Pro / gpt-image-2 (model can render); composite stays safer for pixel-exact UI | — | All weak-text models |
| Native SVG | Recraft V4 | LLM-author SVG (simple geometry) — this is inline_svg mode | Everyone else |
| Photoreal hero | Flux 2 / gpt-image-2 / Imagen 4 Ultra | SDXL + brand LoRA | DALL·E 3 |
| Empty-state illustration | Flux 2 + brand refs (up to 10) / IP-Adapter | SDXL + LoRA → Recraft brand style | One-off Ideogram/MJ (style drift) |
| App icon | Recraft V4 or Ideogram 3 for mark → packaging pipeline | gpt-image-1.5 mark → packaging | Full-bleed Imagen 4 as final |
| Favicon | SVG first (LLM-author or Recraft V3) | Raster 512 → vectorize | — |
| OG image | Satori + @resvg/resvg-js template (no diffusion) | Diffusion only for hero image inside OG template | — |
Prompts are never forwarded verbatim. Rewrite per target family. See the per-angle research under docs/research/ for derivations; implementation is in packages/mcp-server/src/rewriter.ts.
gpt-image-2 / gpt-image-1.5 / gpt-image-1 (OpenAI)gpt-image-1.5 and gpt-image-1 accept background: "transparent". gpt-image-2 does NOT — the param 400s (regression). Route transparent jobs to gpt-image-1.5."Acme" in double quotes. Ceilings: gpt-image-2 ~80 chars / paragraph (~99% per third-party); gpt-image-1.5 dense text (~60 chars, LM Arena #1); gpt-image-1 ~50 chars.negative_prompt field — silently ignored across all three."solid pure white background" and matte externally.gemini-2.5-flash-image (original Nano Banana 1): ~80% accuracy, degrades past 1-3 words.gemini-3.1-flash-image-preview (Nano Banana 2): ~90% accuracy, ranked #1 on Artificial Analysis Image Arena at launch — strong-text, not weak.gemini-3-pro-image-preview (Nano Banana Pro): paragraph-length reliable, ~94-96% accuracy.negative_prompt: Imagen 4 accepts negativePrompt on the Vertex AI endpoint; the Gemini API surface ignores it. All Nano Banana variants ignore it."masterpiece, 8k, studio lighting".negative_prompt is a real CFG sampler feature; use it.@-image refs.negative_prompt. Per BFL's Flux 2 prompting guide: "FLUX.2 does not support negative prompts." On 1.x the fal schema rejects it; on 2.x fal silently no-ops. Use positive anchors instead.-- flags. --sref, --cref, --mref for consistency.external_prompt_only mode.--ar 1:1, --style raw, --q 2./ideogram-v3/generate-transparent endpoint; set rendering_speed: "TURBO" for the Turbo tier (there is no style: "transparent" API parameter).style_id = brand lock. controls.colors = hex palette enforcement.If a BrandBundle is provided, every generation injects it. Shape:
palette: [#hex, #hex, ...] # DTCG color tokens
style_refs: [path.png, ...] # IP-Adapter / --sref / Recraft style_id
lora: path.safetensors? # SDXL/Flux subject/style LoRA
sref_code: "--sref 123456"? # Midjourney
style_id: "uuid"? # Recraft
do_not: ["drop shadows", ...] # Rewritten as positive anchors
logo_mark: path.svg? # Canonical mark for composition
typography: { primary, secondary } # Fonts for composited type
Use asset_brand_bundle_parse to canonicalize a brand.md / brand.json / DTCG tokens.json / AdCP spec into this shape.
Three tiers. Tier-0 always. Tier-1 on first asset in a set. Tier-2 on any failure or on user request.
Tier 0 — deterministic: dimensions, alpha, checkerboard FFT, safe-zone bbox, file-size budget, DCT entropy.
Tier 1 — metric: VQAScore, palette ΔE2000, OCR Levenshtein, WCAG AA at 16×16.
Tier 2 — VLM-as-judge: Claude Sonnet / GPT vision against asset-type rubrics (not wired by default; set PROMPT_ENHANCER_VLM_URL to enable).
| Failure | Action |
|---|---|
| Checkerboard pattern | Regenerate with route change — architectural, not sampling |
| Alpha missing on transparency-required | Matte with BiRefNet / RMBG |
| Wordmark misspelled | Drop text from prompt, composite SVG type |
| Palette drift | Regenerate with controls.colors (Recraft) or recolor post |
| Safe zone violation | Regenerate with explicit center-framing + padding |
| Wrong aspect | Inpaint/outpaint via edit endpoint if available; else regenerate |
| Watermark / stock photo vibe | Regenerate with positive-anchor rewrite |
| Low contrast at 16² (favicon) | Regenerate with explicit contrast instruction |
Key: (model, version, seed, prompt_hash, params_hash). Storage: content-addressed path assets/<hash[0:2]>/<hash>/<variant>.<ext>. The MCP server is synchronous — prompt_hash is emitted on every AssetBundle so a future hosted tier (BullMQ / SQS / Cloudflare Queues) can use it as a deduplication key.
InlineSvgPlan (mode: "inline_svg"){
"mode": "inline_svg",
"asset_type": "logo",
"brief": "…",
"svg_brief": { "viewBox": "…", "palette": {...}, "path_budget": 40, "require": [...], "do_not": [...], "skeleton": "…" },
"instructions_to_host_llm": "…"
}
Read svg_brief. Emit <svg>…</svg> as a ```svg code block in your reply and then call asset_save_inline_svg({ svg, asset_type }) to persist the file. That tool returns an AssetBundle with variants[].path — show those paths to the user so they know the files exist on disk.
ExternalPromptPlan (mode: "external_prompt_only"){
"mode": "external_prompt_only",
"asset_type": "logo",
"enhanced_prompt": "…",
"target_model": "ideogram-3-turbo",
"paste_targets": [{ "name": "Ideogram web", "url": "https://ideogram.ai", "notes": "…" }],
"ingest_hint": {
"tool": "asset_ingest_external",
"args": { "image_path": "<path>", "asset_type": "logo" }
}
}
Show the prompt + the top paste target(s). After the user saves the result, call asset_ingest_external.
AssetBundle (mode: "api"){
"mode": "api",
"asset_type": "logo",
"variants": [
{ "path": "…/master.png", "format": "png", "width": 1024, "height": 1024, "rgba": true },
{ "path": "…/mark.svg", "format": "svg", "paths": 18 }
],
"provenance": { "model": "recraft-v3", "seed": 1234, "prompt_hash": "…", "params_hash": "…" },
"validations": { "tier0": { "...all pass": true } },
"warnings": []
}
Every decision in this skill is traceable to a research angle. See
docs/RESEARCH_MAP.md for the full file-level mapping. The load-bearing
angles most relevant to day-to-day behavior:
docs/research/04-gemini-imagen-prompting/4c-transparent-background-checker-problem.md — why Imagen/Gemini never gets transparencydocs/research/07-midjourney-ideogram-recraft/7b-ideogram-text-rendering-for-logos.md — text ≤3 words ruledocs/research/08-logo-generation/8e-svg-vector-logo-pipeline.md — three SVG paths (Recraft / LLM-author / raster-vectorize)docs/research/09-app-icon-generation/9a-ios-app-icon-hig-specs.md — 824² safe zonedocs/research/19-agentic-mcp-skills-architectures/ — why 13 small tools and three modestesting
Translate a UI brief (a page, a screen, a single component, a feature) into a paste-ready prompt for Nano Banana Pro / gpt-image-2 / Ideogram / Flux 2 / Midjourney that produces a designer-grade mockup as visual inspiration — not pixel-spec UI, not AI slop. Use whenever the user asks for "imagine the X page", "mock up the Y screen", "give me a prompt for nano banana / gpt image 2 to design", "describe this UI for an image model", "draft a prompt for the designer to take inspiration from", or any time the agent needs to produce a UI image-gen prompt for a real product surface (pricing page, dashboard, settings, onboarding, mobile screen, marketing hero, single component). Be pushy — trigger even when the user says "design" without "prompt", or "show me what X could look like" — the agent should reach for this skill before hand-rolling a brief.
testing
Translate a UI brief (a page, a screen, a single component, a feature) into a paste-ready prompt for Nano Banana Pro / gpt-image-2 / Ideogram / Flux 2 / Midjourney that produces a designer-grade mockup as visual inspiration — not pixel-spec UI, not AI slop. Use whenever the user asks for "imagine the X page", "mock up the Y screen", "give me a prompt for nano banana / gpt image 2 to design", "describe this UI for an image model", "draft a prompt for the designer to take inspiration from", or any time the agent needs to produce a UI image-gen prompt for a real product surface (pricing page, dashboard, settings, onboarding, mobile screen, marketing hero, single component). Be pushy — trigger even when the user says "design" without "prompt", or "show me what X could look like" — the agent should reach for this skill before hand-rolling a brief.
development
Rewrite an asset brief into the exact prompt dialect of the target image model (OpenAI gpt-image-1, Google Imagen/Gemini, SDXL, Flux.1/Flux.2, Midjourney, Ideogram, Recraft). Handles negative-prompt translation, token budgets, transparency quirks, brand-palette injection, and text-in-image ceilings so that `asset_generate_*` submissions succeed on the first try.
development
Generate a production-grade logo (primary brand mark). Returns RGBA PNG master + SVG vector + monochrome variant. Route by text-length and per-model ceiling. Strong-text models render multi-word and even paragraph-length wordmarks reliably; weak-text models composite SVG type post-render.