gpt-image-2/SKILL.md
Generate and edit images using OpenAI's GPT Image 2 API. Interactive skill that guides users through image creation with style presets, cost-aware draft/final workflow, thinking mode, carousels, and photo editing. This skill should be used when the user requests image generation via OpenAI/GPT Image 2, wants to create social media carousels, edit photos into artistic styles, or needs images with readable text (infographics, diagrams, posters).
npx skillsauth add glebis/claude-skills gpt-image-2Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Generate and edit images via OpenAI's GPT Image 2 API with an interactive, guided workflow.
When the user invokes this skill, guide them through these steps using AskUserQuestion. Do not skip steps — the interactive flow is the core experience.
Ask the user what they want to create. Offer these options:
If the user already provided a clear prompt (e.g. "generate an editorial image of a rocket"), skip to Step 3.
Show the user available presets grouped by category. Read presets.yaml and present them:
Visual styles (no text in image): editorial, blueprint, ink, risograph, wireframe, constellation, brutalist, grain
Text-heavy (leverages GPT Image 2 text rendering): infographic, slide, diagram, poster, menu, manga
Community favorites: trading-card, pixar, app-mockup, isometric, action-figure, cinematic, panorama
Reference-anchored:
vhs — 1980s late-night infomercial title card: scanline-striped gradient italic caps on pure black. It auto-attaches a bundled reference image (references/vhs-infomercial.png), so the look stays consistent batch-to-batch. Pass the ad copy as the subject; for multi-line copy separate lines with / (e.g. --preset vhs "THEY TRUSTED YOU / NOW / PROVE IT").
Custom — user describes their own style
Ask: "Which style? Or describe your own."
Ask where this will be used:
Before any generation spend, the script now composes the final prompt first
(preset + subject + style), then checks it for internal contradictions — most often
a preset that hard-codes something the subject overrides (e.g. the editorial preset
forces "on pure black background" while your subject asks for a warm off-white ground).
The check prefers a fast Haiku call via the llm CLI; if Haiku is unavailable (no
llm, no Anthropic credit) it falls back to the configured llm default model, then to a
built-in static heuristic. The resolved prompt and the verdict are printed. If a conflict
is found, generation is aborted before spending — fix the prompt or preset and re-run, or
override with --force (generate anyway) or --no-preflight (skip the check). This is what
prevents the "generated on the wrong background, now regenerate" waste.
When composing prompts that set a background/palette, don't combine a background-fixing
preset (editorial, blueprint, etc.) with a different requested background — either drop
the preset and specify the full style yourself, or accept the preset's background.
Always generate a draft first unless the user says "skip draft" or uses --draft false.
--draft (quality=low, ~$0.006/image)--quality high (~$0.21/image)--seed from the draft to maintain composition when upgrading to finalThis draft→final flow saves ~97% on iteration costs.
After generation, always:
open <path> for full-resolution previewWhen the user wants a carousel (5-10 slides):
Ask: "What's the story? Give me the key message and I'll draft a 10-slide arc."
Then propose a slide-by-slide plan like:
Slide 1: [Cover] — hook headline + hero image
Slide 2: [Problem] — bold statement
Slide 3: [Context] — illustration + explanation
...
Slide 10: [CTA] — call to action with URL
Ask the user to approve or modify the plan.
Use the same preset + seed range across all slides. For carousels:
--seed to lock composition patternsGenerate all slides as drafts first ($0.006 × 10 = $0.06 total). Show them all to the user as a contact sheet or one by one. Ask which ones to regenerate or adjust.
Only generate finals for approved slides. Offer to generate all at once with -y flag.
When the user wants to transform a photo:
osascript to a temp fileUse --edit <path> for the API call.
Always communicate costs before generating:
| Quality | Per image | 10-slide carousel |
|---------|-----------|-------------------|
| --draft (low) | $0.006 | $0.06 |
| medium | $0.05 | $0.50 |
| high (default) | $0.21 | $2.10 |
| high + thinking | $0.25-0.42 | $2.50-4.20 |
Thinking mode adds 20-100% cost. Only suggest it for text-heavy or complex compositions.
The script auto-confirms when cost < $0.50. Above that, it prompts the user.
When helping users write prompts, apply these patterns:
'with the headline "Hello World"'editorial-magazine, studio-product to converge batches--seed for iteration: lock composition, vary only the prompt details# Basic generation
scripts/gpt_image_2.py "prompt" output.png
# With preset and platform
scripts/gpt_image_2.py --preset editorial --platform square "subject" out.png
# Draft mode (~$0.006/image)
scripts/gpt_image_2.py --draft "prompt" out.png
# With thinking for complex layouts
scripts/gpt_image_2.py --thinking medium --preset diagram "OAuth flow" out.png
# Seed for reproducibility
scripts/gpt_image_2.py --seed 42 "prompt" out.png
# Edit existing photo
scripts/gpt_image_2.py --edit photo.png "transform into constellation style" out.png
# Reference-anchored preset (auto-attaches its bundled reference image)
scripts/gpt_image_2.py --preset vhs --platform youtube "THEY TRUSTED YOU / NOW / PROVE IT" ad.png
# Variants with contact sheet
scripts/gpt_image_2.py --n 4 --preset ink "mountain" out.png
# Cost estimate
scripts/gpt_image_2.py --estimate --n 10 --quality high "batch test"
# Skip confirmation
scripts/gpt_image_2.py -y --n 10 "batch" out.png
# Dry run (show prompt without API call)
scripts/gpt_image_2.py --dry-run --preset editorial "test" out.png
# Preflight runs automatically before spend; override if needed
scripts/gpt_image_2.py --force "prompt with a known conflict" out.png # generate anyway
scripts/gpt_image_2.py --no-preflight "prompt" out.png # skip the check
scripts/gpt_image_2.py — main CLI (Python, requires PyYAML)presets.yaml — style presets (visual + text-heavy + community + reference-anchored). A preset may declare a reference: path (relative to the skill dir); it auto-attaches as a style anchor unless the user passes their own --reference. See the vhs preset.platforms.yaml — 8 platform sizing presetsreferences/api_reference.md — full API documentationreferences/vhs-infomercial.png — bundled style anchor for the vhs preset~/.config/gpt-image-2/config.yaml — user defaults~/.config/gpt-image-2/history.jsonl — generation log~/.config/gpt-image-2/last.json — last run (for again)development
Create Tufte-inspired data reports and infographic dashboards as standalone HTML files. Uses EB Garamond for text, Monaspace Argon for numbers, Chart.js for interactive charts, and inline SVG sparklines. Produces publication-quality reports with 2-column narrative+data layouts, status dashboards, scroll animations, and responsive mobile support. Use this skill whenever the user wants to create a data report, activity dashboard, infographic, personal analytics page, health tracker visualization, or any document that combines narrative text with interactive charts and tables. Also triggers for "make a report like Tufte", "create an infographic", "build a dashboard", "visualize my data", or requests for beautiful data-driven documents.
documentation
Cut a software release and maintain a tiered compatibility policy. Use when the user wants to release, ship a version, bump the version, tag a release, write a changelog, or update COMPATIBILITY. Config-driven via release.config.json; bumps version files, runs a readiness gate, updates COMPATIBILITY.md tiers and deprecations, tags (→ release workflow), and reports closed issues. Teaches the underlying standards as it runs.
development
Sync and manage bilingual (EN/RU) library content for agency-docs. Use when adding, updating, or reviewing library articles. Handles translation, sync checks, and Russian stylistic review.
development
This skill should be used to watch a long-running background job (ffmpeg/media encode, qmd or other embedding/vector-DB run, batch agent/LLM pipeline, or a real-browser/agent-browser daemon) until it finishes or wedges, then deliver a verdict (done, needs-attention, or blocked) plus the exact next command, without burning dozens of manual poll commands. Triggers on "babysit this job", "watch this until it's done", "ping me when the encode/embed/batch finishes", "is this background process stuck", "monitor this ffmpeg/qmd run", or any request to wait on a long-running process and be told when it's complete or hung.