skills/gemini-nano-banana-2/SKILL.md
Generate and edit images using Google's Gemini Nano Banana 2 (gemini-3.1-flash-image-preview). Use when the user asks to generate, create, edit, modify, change, alter, or update images using Gemini. Also use when user references an existing image file and asks to modify it in any way. Supports text-to-image generation, image editing, multi-image compositing (up to 14 reference images), style transfer, Google Web Search and Image Search grounding for real-time data, high-resolution output up to 4K, controllable thinking levels, 14 aspect ratios, and advanced text rendering. DO NOT read the image file first - use this skill directly with the --input-image parameter.
npx skillsauth add tamtom/image-generation-skills gemini-nano-banana-2Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Generate new images or edit existing ones using Google's Gemini Nano Banana 2 model (gemini-3.1-flash-image-preview) via the Gemini API. This is the latest and most capable Gemini image model, featuring advanced reasoning ("Thinking") with controllable levels, high-fidelity text rendering, Google Web + Image Search grounding, output from 512 up to 4K resolution, and 14 aspect ratios.
Run the script using absolute path (do NOT cd to skill directory first):
Generate new image:
uv run ~/.claude/skills/gemini-nano-banana-2/scripts/generate_image.py --prompt "your image description" --filename "output.png" [--aspect-ratio 1:1|1:4|1:8|2:3|3:2|3:4|4:1|4:3|4:5|5:4|8:1|9:16|16:9|21:9] [--resolution 512|1K|2K|4K] [--output-format png|webp|jpeg] [--response-modality text-and-image|image-only] [--thinking-level minimal|high] [--google-search] [--image-search] [--yes] [--cost-threshold 0.10] [--api-key KEY]
Edit existing image:
uv run ~/.claude/skills/gemini-nano-banana-2/scripts/generate_image.py --prompt "editing instructions" --filename "output.png" --input-image "path/to/input.png" [--aspect-ratio 1:1] [--resolution 2K] [--api-key KEY]
Edit with multiple reference images (compositing/style transfer, up to 14 images):
uv run ~/.claude/skills/gemini-nano-banana-2/scripts/generate_image.py --prompt "combine the subject from first image with the style of second image" --filename "output.png" --input-image "subject.png" --input-image "style-ref.png" [--resolution 2K] [--api-key KEY]
Generate with Google Web Search grounding (real-time data):
uv run ~/.claude/skills/gemini-nano-banana-2/scripts/generate_image.py --prompt "visualize the current weather forecast for San Francisco" --filename "output.png" --google-search --aspect-ratio 16:9 [--api-key KEY]
Generate with Google Image Search grounding (visual reference from the web):
uv run ~/.claude/skills/gemini-nano-banana-2/scripts/generate_image.py --prompt "a detailed painting of a resplendent quetzal bird resting on a flower" --filename "output.png" --image-search [--api-key KEY]
Generate with high thinking level (complex prompts):
uv run ~/.claude/skills/gemini-nano-banana-2/scripts/generate_image.py --prompt "A futuristic city built inside a giant glass bottle floating in space" --filename "output.png" --thinking-level high --resolution 2K [--api-key KEY]
Important: Always run from the user's current working directory so images are saved where the user is working, not in the skill directory.
Map user requests:
1:11:12:3 or 9:163:2 or 16:99:1616:921:94:5 (portrait) or 5:4 (landscape)4:1 or 1:4 depending on orientation8:1Map user requests:
1K5122K4K512 or 1KMap user requests:
pngwebpjpegMap user requests:
text-and-imageimage-onlyMap user requests:
minimalhighhighMap user requests:
--google-search--google-search--google-search for both web and image groundingMap user requests:
--image-search--image-searchApproximate per-image costs by resolution: | Resolution | Per Image | |---|---| | 512 | $0.045 | | 1K | $0.067 | | 2K | $0.101 | | 4K | $0.151 |
Additional costs: thinking tokens (~$0.006–$0.030), search grounding (~$0.001), input images (~$0.0001 each). Requests at 2K or above will typically trigger the cost warning.
Important: Always pass --yes when calling from Claude so the user is not blocked by an interactive prompt they cannot see. Instead, inform the user of the estimated cost in your message before running the command. If the estimated cost would be high (2K+, 4K, high thinking), mention the approximate cost to the user first and ask if they want to proceed.
--input-image for each reference image (can repeat up to 14 times)The script checks for API key in this order:
--api-key argument (use if user provided key in chat)GEMINI_API_KEY environment variableGOOGLE_API_KEY environment variableIf none is available, the script exits with an error message.
Generate filenames with the pattern: yyyy-mm-dd-hh-mm-ss-name.{ext}
Format: {timestamp}-{descriptive-name}.{ext}
yyyy-mm-dd-hh-mm-ss (24-hour format)--output-format (.png, .webp, .jpg)Examples:
2025-12-17-14-23-05-japanese-garden.png2025-12-17-15-30-12-sunset-mountains.webp2025-12-17-16-45-33-nyc-weather.jpgAll editing uses the same generate_content API by passing input images alongside the text prompt. The model natively understands editing context.
When the user wants to modify an existing image:
--input-image parameter with the path to the imageWhen the user wants to combine elements from multiple images:
--input-image multiple times (up to 14 images)Common editing tasks: add/remove elements, change style, adjust colors, replace backgrounds, composite images, style transfer, character consistency across scenes, text overlay.
For generation: Pass user's image description as-is to --prompt. Only rework if clearly insufficient. The model excels at understanding natural language descriptions.
For editing: Pass editing instructions in --prompt (e.g., "add a rainbow in the sky", "make it look like a watercolor painting")
For multi-image: Reference images by position (e.g., "combine the person from the first image with the background of the second image")
For search-grounded: Include what real-time data is needed (e.g., "visualize the current weather forecast for the next 5 days in San Francisco")
Advanced text rendering: This model excels at generating legible text in images. When text is needed, first describe the text content clearly in the prompt (e.g., "a magazine cover with the title 'GEMINI' in bold serif font").
Preserve user's creative intent in all cases.
The model uses a built-in "Thinking" process that reasons through complex prompts. It generates interim "thought images" to refine composition before producing the final output.
The thinking process:
--output-formatname-1.ext, name-2.ext, etc.Generate new image:
uv run ~/.claude/skills/gemini-nano-banana-2/scripts/generate_image.py --prompt "A photorealistic close-up portrait of an elderly Japanese ceramicist with deep wrinkles and a warm smile, inspecting a freshly glazed tea bowl in his rustic workshop. Soft golden hour light, 85mm portrait lens, bokeh background." --filename "2025-12-17-14-23-05-japanese-ceramicist.png" --resolution 2K --aspect-ratio 2:3
Generate with text rendering (high thinking):
uv run ~/.claude/skills/gemini-nano-banana-2/scripts/generate_image.py --prompt "A glossy magazine cover with the large bold title 'NANO BANANA' in serif font. A person in a sleek minimal dress playfully holds the number 2. Issue number and 'Feb 2026' in the corner with a barcode. The magazine is on a shelf against an orange plastered wall." --filename "2025-12-17-14-25-30-magazine-cover.png" --resolution 4K --aspect-ratio 3:4 --thinking-level high
Generate with Google Web Search grounding:
uv run ~/.claude/skills/gemini-nano-banana-2/scripts/generate_image.py --prompt "Visualize the current weather forecast for the next 5 days in San Francisco as a clean, modern weather chart with outfit suggestions for each day" --filename "2025-12-17-14-26-00-sf-weather.png" --google-search --aspect-ratio 16:9 --resolution 2K
Generate with Google Image Search grounding:
uv run ~/.claude/skills/gemini-nano-banana-2/scripts/generate_image.py --prompt "A detailed painting of a resplendent quetzal bird resting on a flower, with a natural gradient background" --filename "2025-12-17-14-26-30-quetzal-painting.png" --image-search --aspect-ratio 3:2 --resolution 2K
Edit existing image (style transfer):
uv run ~/.claude/skills/gemini-nano-banana-2/scripts/generate_image.py --prompt "Transform this photograph into the artistic style of Vincent van Gogh's Starry Night, with swirling impasto brushstrokes and a palette of deep blues and bright yellows" --filename "2025-12-17-14-27-00-starry-night-style.png" --input-image "city-photo.jpg" --resolution 2K
Composite from multiple images:
uv run ~/.claude/skills/gemini-nano-banana-2/scripts/generate_image.py --prompt "Place the person from the first image into the beach scene from the second image, maintaining their appearance and clothing" --filename "2025-12-17-14-29-00-beach-composite.png" --input-image "person.png" --input-image "beach.png" --resolution 2K
Product mockup with logo:
uv run ~/.claude/skills/gemini-nano-banana-2/scripts/generate_image.py --prompt "Put this logo on a high-end advertisement for a premium perfume. The logo is perfectly integrated into the bottle design." --filename "2025-12-17-14-30-00-perfume-ad.png" --input-image "logo.png" --resolution 4K --aspect-ratio 3:4
Character consistency across scenes:
uv run ~/.claude/skills/gemini-nano-banana-2/scripts/generate_image.py --prompt "An office group photo of these people, they are making funny faces" --filename "2025-12-17-14-31-00-office-group.png" --input-image "person1.png" --input-image "person2.png" --input-image "person3.png" --aspect-ratio 5:4 --resolution 2K
Small thumbnail / icon:
uv run ~/.claude/skills/gemini-nano-banana-2/scripts/generate_image.py --prompt "A kawaii-style sticker of a happy red panda wearing a tiny bamboo hat, munching on a green bamboo leaf. Bold clean outlines, cel-shading, vibrant colors. White background." --filename "2025-12-17-14-32-00-red-panda-sticker.png" --resolution 512 --response-modality image-only
Ultra-wide cinematic banner:
uv run ~/.claude/skills/gemini-nano-banana-2/scripts/generate_image.py --prompt "A cinematic sci-fi landscape with a massive ring-shaped space station orbiting a gas giant, dramatic volumetric lighting" --filename "2025-12-17-14-33-00-space-banner.png" --aspect-ratio 21:9 --resolution 4K --thinking-level high
data-ai
Generate and edit images using OpenAI's GPT Image 1.5 model. Use when the user asks to generate, create, edit, modify, change, alter, or update images. Also use when user references an existing image file and asks to modify it in any way (e.g., "modify this image", "change the background", "replace X with Y"). Supports text-to-image generation, multi-image compositing, style transfer, and image editing with optional mask. DO NOT read the image file first - use this skill directly with the --input-image parameter.
development
Maintainer-only workflow for handling GitHub Secret Scanning alerts on OpenClaw. Use when Codex needs to triage, redact, clean up, and resolve secret leakage found in issue comments, issue bodies, PR comments, or other GitHub content.
development
Maintainer workflow for OpenClaw releases, prereleases, changelog release notes, and publish validation. Use when Codex needs to prepare or verify stable or beta release steps, align version naming, assemble release notes, check release auth requirements, or validate publish-time commands and artifacts.
development
Run, watch, debug, and extend OpenClaw QA testing with qa-lab and qa-channel. Use when Codex needs to execute the repo-backed QA suite, inspect live QA artifacts, debug failing scenarios, add new QA scenarios, or explain the OpenClaw QA workflow. Prefer the live OpenAI lane with regular openai/gpt-5.4 in fast mode; do not use gpt-5.4-pro or gpt-5.4-mini unless the user explicitly overrides that policy.