skills/baoyu-imagine/SKILL.md
AI image generation with OpenAI GPT Image 2, Azure OpenAI, Google, OpenRouter, DashScope, Z.AI GLM-Image, MiniMax, Jimeng, Seedream and Replicate APIs. Supports text-to-image, reference images, aspect ratios, and batch generation from saved prompt files. Sequential by default; use batch parallel generation when the user already has multiple prompts or wants stable multi-image throughput. Use when user asks to generate, create, or draw images.
npx skillsauth add guanyang/antigravity-skills baoyu-imagineInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Official API-based image generation. Supports OpenAI GPT Image 2, Azure OpenAI, Google, OpenRouter, DashScope (阿里通义万象), Z.AI GLM-Image, MiniMax, Jimeng (即梦), Seedream (豆包) and Replicate.
When this skill prompts the user, follow this tool-selection rule (priority order):
AskUserQuestion, request_user_input, clarify, ask_user, or any equivalent.Concrete AskUserQuestion references below are examples — substitute the local equivalent in other runtimes.
{baseDir} = this SKILL.md's directory. Main script: {baseDir}/scripts/main.ts. Resolve ${BUN_X}: prefer bun; else npx -y bun; else suggest brew install oven-sh/bun/bun.
This step MUST complete before any image generation — generation is blocked until EXTEND.md exists.
Check these paths in order; first hit wins:
| Path | Scope |
|------|-------|
| .baoyu-skills/baoyu-imagine/EXTEND.md | Project |
| ${XDG_CONFIG_HOME:-$HOME/.config}/baoyu-skills/baoyu-imagine/EXTEND.md | XDG |
| $HOME/.baoyu-skills/baoyu-imagine/EXTEND.md | User home |
default_model.[provider] is null → ask model only.references/config/first-time-setup.md) using AskUserQuestion to collect provider + model + quality + save location. Save EXTEND.md, then continue. Do not generate images before this completes.Legacy compatibility: if .baoyu-skills/baoyu-image-gen/EXTEND.md exists and the new path doesn't, the runtime renames it to baoyu-imagine. If both exist, the runtime leaves them alone and uses the new path.
EXTEND.md keys: default provider, default quality, default aspect ratio, default image size, OpenAI image API dialect, default models, batch worker cap, provider-specific batch limits. Schema: references/config/preferences-schema.md.
Minimum working examples — see references/usage-examples.md for the full set including per-provider invocations and batch mode.
When the user wants a real person/character/object preserved from reference images, do not replace the reference with a long generic description. Prefer short, hard identity-preservation language:
Pitfall: long descriptions like "young East Asian woman, oval face, clear eyes..." can cause the model to synthesize a new person matching the description instead of preserving the referenced person.
# Basic
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image cat.png
# With aspect ratio and high quality
${BUN_X} {baseDir}/scripts/main.ts --prompt "A landscape" --image out.png --ar 16:9 --quality 2k
# Prompt from files
${BUN_X} {baseDir}/scripts/main.ts --promptfiles system.md content.md --image out.png
# With reference image
${BUN_X} {baseDir}/scripts/main.ts --prompt "Make blue" --image out.png --ref source.png
# Specific provider
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider dashscope --model qwen-image-2.0-pro
# OpenAI GPT Image 2
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider openai --model gpt-image-2
# Batch mode
${BUN_X} {baseDir}/scripts/main.ts --batchfile batch.json --jobs 4
When the user wants a person/object preserved from reference images:
| Option | Description |
|--------|-------------|
| --prompt <text>, -p | Prompt text |
| --promptfiles <files...> | Read prompt from files (concatenated) |
| --image <path> | Output image path (required in single-image mode) |
| --batchfile <path> | JSON batch file for multi-image generation |
| --jobs <count> | Worker count for batch mode (default: auto, max from config, built-in default 10) |
| --provider google\|openai\|azure\|openrouter\|dashscope\|zai\|minimax\|jimeng\|seedream\|replicate | Force provider (default: auto-detect) |
| --model <id>, -m | Model ID — see provider references for defaults and allowed values |
| --ar <ratio> | Aspect ratio (16:9, 1:1, 4:3, …) |
| --size <WxH> | Explicit size (e.g., 1024x1024; for gpt-image-2, width/height must be multiples of 16, max edge 3840px, ratio no wider than 3:1) |
| --quality normal\|2k | Quality preset (default: 2k) |
| --imageSize 1K\|2K\|4K | Image size for Google/OpenRouter (default: from quality) |
| --imageApiDialect openai-native\|ratio-metadata | OpenAI-compatible endpoint dialect — use ratio-metadata for gateways that expect aspect-ratio size plus metadata.resolution |
| --ref <files...> | Reference images. Supported by Google multimodal, OpenAI GPT Image edits, Azure OpenAI edits (PNG/JPG only), OpenRouter multimodal models, Replicate supported families, MiniMax subject-reference, Seedream 5.0/4.5/4.0, DashScope wan2.7-image-pro/wan2.7-image. Not supported by Jimeng, Seedream 3.0, SeedEdit 3.0, or any DashScope model outside the wan2.7-image* family |
| --n <count> | Number of images. Replicate requires --n 1 (single-output save semantics) |
| --json | JSON output |
| Variable | Description |
|----------|-------------|
| OPENAI_API_KEY | OpenAI API key |
| AZURE_OPENAI_API_KEY | Azure OpenAI API key |
| OPENROUTER_API_KEY | OpenRouter API key |
| GOOGLE_API_KEY | Google API key |
| DASHSCOPE_API_KEY | DashScope API key |
| ZAI_API_KEY (alias BIGMODEL_API_KEY) | Z.AI API key |
| MINIMAX_API_KEY | MiniMax API key |
| REPLICATE_API_TOKEN | Replicate API token |
| JIMENG_ACCESS_KEY_ID, JIMENG_SECRET_ACCESS_KEY | Jimeng (即梦) Volcengine credentials |
| ARK_API_KEY | Seedream (豆包) Volcengine ARK API key |
| <PROVIDER>_IMAGE_MODEL | Per-provider model override (OPENAI_IMAGE_MODEL, GOOGLE_IMAGE_MODEL, DASHSCOPE_IMAGE_MODEL, ZAI_IMAGE_MODEL/BIGMODEL_IMAGE_MODEL, MINIMAX_IMAGE_MODEL, OPENROUTER_IMAGE_MODEL, REPLICATE_IMAGE_MODEL, JIMENG_IMAGE_MODEL, SEEDREAM_IMAGE_MODEL) |
| AZURE_OPENAI_DEPLOYMENT (alias AZURE_OPENAI_IMAGE_MODEL) | Azure default deployment |
| <PROVIDER>_BASE_URL | Per-provider endpoint override |
| AZURE_API_VERSION | Azure image API version (default 2025-04-01-preview) |
| JIMENG_REGION | Jimeng region (default cn-north-1) |
| OPENAI_IMAGE_API_DIALECT | openai-native | ratio-metadata |
| OPENROUTER_HTTP_REFERER, OPENROUTER_TITLE | Optional OpenRouter attribution |
| BAOYU_IMAGE_GEN_MAX_WORKERS | Override batch worker cap |
| BAOYU_IMAGE_GEN_<PROVIDER>_CONCURRENCY | Per-provider concurrency (e.g., BAOYU_IMAGE_GEN_REPLICATE_CONCURRENCY) |
| BAOYU_IMAGE_GEN_<PROVIDER>_START_INTERVAL_MS | Per-provider start-gap |
Load priority: CLI args > EXTEND.md > env vars > <cwd>/.baoyu-skills/.env > ~/.baoyu-skills/.env
--provider openai --model gpt-image-2 uses the standard OpenAI Images API (/v1/images/generations or /v1/images/edits) and requires OPENAI_API_KEY. A Codex or ChatGPT desktop login is a different entitlement and is not a drop-in replacement for OPENAI_API_KEY; do not paste a Codex OAuth token into OPENAI_API_KEY or only set OPENAI_BASE_URL to a Codex backend.
If the user wants to use their Codex subscription / GPT Image 2 entitlement without an OpenAI API key, route through a Codex-native backend instead of this skill's openai provider:
imagegen skill/tool.codex CLI installed and logged in: use the repo-level scripts/codex-imagegen.sh wrapper when the calling skill supports it (for example baoyu-cover-image). Resolve it from the plugin/repo root and pass absolute prompt/output/reference paths.image_generate tool: use that tool as a fallback, and state whether reference images were passed directly or reconstructed from extracted traits.Do not modify the existing openai provider to silently consume Codex OAuth. If first-class Codex OAuth support is added to baoyu-imagine, implement it as a distinct provider (for example openai-codex) with its own auth, route, request shape, docs, and tests. See references/codex-oauth-vs-openai-api-key.md.
Priority (highest → lowest) applies to every provider:
--model <id>default_model.[provider]<PROVIDER>_IMAGE_MODELFor OpenAI, the built-in default is gpt-image-2. gpt-image-1.5, gpt-image-1, and GPT Image snapshots remain selectable with --model or OPENAI_IMAGE_MODEL.
For Azure, --model / default_model.azure is the Azure deployment name. AZURE_OPENAI_DEPLOYMENT is the preferred env var; AZURE_OPENAI_IMAGE_MODEL is kept as a backward-compatible alias. If your Azure deployment is named after the underlying model, use gpt-image-2; otherwise use the exact custom deployment name.
EXTEND.md overrides env vars: if EXTEND.md sets default_model.google: "gemini-3-pro-image-preview" and the env var sets GOOGLE_IMAGE_MODEL=gemini-3.1-flash-image-preview, EXTEND.md wins.
Display model info before each generation:
Using [provider] / [model]Switch model: --model <id> | EXTEND.md default_model.[provider] | env <PROVIDER>_IMAGE_MODELprovider=openai means the auth and routing entrypoint is OpenAI-compatible. It does not guarantee the upstream image API uses OpenAI native semantics. When a gateway expects a different wire format, set default_image_api_dialect in EXTEND.md, OPENAI_IMAGE_API_DIALECT, or --imageApiDialect:
openai-native: pixel size (1536x1024) and native OpenAI quality fieldsratio-metadata: aspect-ratio size (16:9) plus metadata.resolution (1K|2K|4K) and metadata.orientationUse openai-native for the OpenAI native API or strict clones; try ratio-metadata for compatibility gateways in front of Gemini or similar models. Current limitation: ratio-metadata applies only to text-to-image; reference-image edits still need openai-native or a provider with first-class edit support.
Each provider has its own quirks (model families, size rules, ref support, limits). Read these when the user picks that provider or asks for non-default behavior:
| Provider | Reference |
|----------|-----------|
| DashScope (Qwen-Image families, custom sizes) | references/providers/dashscope.md |
| Z.AI (GLM-Image, cogview-4) | references/providers/zai.md |
| MiniMax (image-01, subject-reference) | references/providers/minimax.md |
| OpenRouter (multimodal models, /chat/completions flow) | references/providers/openrouter.md |
| Replicate (nano-banana, Seedream, Wan) | references/providers/replicate.md |
--ref provided + no --provider → auto-select Google → OpenAI → Azure → OpenRouter → Replicate → Seedream → MiniMax (MiniMax's subject reference is more specialized toward character/portrait consistency)--provider specified → use it (if --ref, must be google/openai/azure/openrouter/replicate/seedream/minimax)| Preset | Google imageSize | OpenAI size | OpenRouter size | Replicate resolution | Use case |
|--------|------------------|-------------|-----------------|----------------------|----------|
| normal | 1K | 1024px target | 1K | 1K | Quick previews |
| 2k (default) | 2K | 2048px target | 2K | 2K | Covers, illustrations, infographics |
Google/OpenRouter imageSize can be overridden with --imageSize 1K|2K|4K.
For OpenAI native gpt-image-2, normal maps to quality=medium and a low-latency valid size near the requested aspect ratio; 2k maps to quality=high and 2048px-class sizes such as 2048x2048, 2048x1152, or 1152x2048. Use explicit --size for valid custom or 4K outputs, e.g. 3840x2160.
Supported: 1:1, 16:9, 9:16, 4:3, 3:4, 2.35:1.
imageConfig.aspectRatiogpt-image-2 uses the closest valid custom size for the requested ratio; older GPT Image and DALL·E models use their closest supported fixed sizeimageGenerationOptions.aspect_ratio; if only --size <WxH> is given, the ratio is inferredgoogle/nano-banana* uses aspect_ratio, bytedance/seedream-* uses documented Replicate ratios, Wan 2.7 maps --ar to a concrete sizeaspect_ratio values; if --size <WxH> is given without --ar, sends width/height for image-01Default: sequential. Batch parallel: enabled automatically when --batchfile contains 2+ pending tasks.
| Situation | Prefer | Why |
|-----------|--------|-----|
| One image, or 1-2 simple images | Sequential | Lower coordination overhead, easier debugging |
| Multiple images with saved prompt files | Batch (--batchfile) | Reuses finalized prompts, applies shared throttling/retries, predictable throughput |
| Each image still needs its own reasoning / prompt writing / style exploration | Subagents | Work is still exploratory, each needs independent analysis |
| Input is outline.md + prompts/ (e.g. from baoyu-article-illustrator) | Batch — use scripts/build-batch.ts to assemble the payload | The outline + prompt files already contain everything needed |
Rule of thumb: once prompt files are saved and the task is "generate all of these", prefer batch over subagents. Use subagents only when generation is coupled with per-image thinking or divergent creative exploration.
Parallel behavior:
--jobs <count>If --provider openai --model gpt-image-2 fails because OPENAI_API_KEY is missing but the current runtime has a native image-generation backend or the repo-level codex-imagegen wrapper is available, use that path rather than leaving the user waiting. Be explicit about whether the fallback is true reference-image generation or only a text-prompt reconstruction from extracted visual traits. See references/codex-image2-fallback.md.
| File | Content |
|------|---------|
| references/usage-examples.md | Extended CLI examples across providers and batch mode |
| references/codex-oauth-vs-openai-api-key.md | Why Codex/ChatGPT OAuth image2 entitlement is not usable through baoyu-imagine's standard OpenAI API-key provider |
| references/codex-image2-fallback.md | Practical fallback behavior when OpenAI API credentials are absent but Codex/native image generation is available |
| references/providers/dashscope.md | DashScope families, sizes, limits |
| references/providers/zai.md | Z.AI GLM-image / cogview-4 |
| references/providers/minimax.md | MiniMax image-01 + subject reference |
| references/providers/openrouter.md | OpenRouter multimodal flow |
| references/providers/replicate.md | Replicate supported families + guardrails |
| references/config/preferences-schema.md | EXTEND.md schema |
| references/config/first-time-setup.md | First-time setup flow |
Custom configurations via EXTEND.md. See Step 0 for paths and schema.
tools
This skill should be used when the user asks to "translate", "翻译", "精翻", "translate article", "translate to Chinese", "translate to English", "改成中文", "改成英文", "convert to Chinese", "localize", "本地化", "refined translation", "精细翻译", "proofread translation", "快速翻译", "快翻", "这篇文章翻译一下", or provides a URL/file with translation intent. Supports three modes (quick/normal/refined) with custom glossary support.
tools
Posts content and articles to X (Twitter). Supports regular posts with images/videos and X Articles (long-form Markdown). In Codex, honor explicit requests for the Codex Chrome plugin/@chrome by using the Chrome Extension workflow; otherwise use Chrome Computer Use when available and fall back to real Chrome CDP scripts only when allowed. Use when user asks to "post to X", "tweet", "publish to Twitter", or "share on X".
content-media
Posts content to Weibo (微博). Supports regular posts with text, images, and videos, and headline articles (头条文章) with Markdown input via Chrome CDP. Use when user asks to "post to Weibo", "发微博", "发布微博", "publish to Weibo", "share on Weibo", "写微博", or "微博头条文章".
development
Posts content to WeChat Official Account (微信公众号) via API or Chrome CDP. Supports article posting (文章) with HTML, markdown, or plain text input, and image-text posting (贴图, formerly 图文) with multiple images. Markdown article workflows default to converting ordinary external links into bottom citations for WeChat-friendly output. Use when user mentions "发布公众号", "post to wechat", "微信公众号", or "贴图/图文/文章".