skills/gemini-image-generator/SKILL.md
Generate images using Google's Gemini API. Use when creating images from text prompts, editing existing images, or combining reference images for AI-generated visual content.
npx skillsauth add ckorhonen/claude-skills gemini-image-generatorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Generate images using Google's Gemini API with support for text-to-image generation, image editing, and multi-image reference inputs. Supports the fast Gemini 2.5 Flash Image model and the high-quality Gemini 3.1 Flash Image model with up to 4K resolution.
Model naming note (2026): Google's image generation models follow a "Nano Banana" branding.
gemini-2.5-flash-image-previewis "Nano Banana",gemini-3.1-flash-image-previewis "Nano Banana 2". These are distinct from the conversational Gemini models. See models reference.
google-genai packageGEMINI_API_KEY environment variablepip install google-genai
export GEMINI_API_KEY="your-api-key"
Add to your shell profile (~/.zshrc or ~/.bashrc) for persistence:
echo 'export GEMINI_API_KEY="your-api-key"' >> ~/.zshrc
Generate a simple image:
python scripts/generate_image.py -p "A fluffy orange cat sitting on a windowsill, warm sunlight, cozy atmosphere"
Generate with specific aspect ratio:
python scripts/generate_image.py -p "Modern tech startup banner" -a 16:9 -o banner.png
Edit an existing image:
python scripts/generate_image.py -p "Make the sky more dramatic with sunset colors" -i photo.jpg -o edited.png
python scripts/generate_image.py [options]
Required:
-p, --prompt TEXT Text prompt describing the image
Optional:
-o, --output PATH Output file path (default: auto-generated)
-m, --model MODEL Model to use (default: gemini-3.1-flash-image-preview)
-a, --aspect-ratio RATIO Aspect ratio (default: 1:1)
-s, --size SIZE Image size: 1K, 2K, 4K (default: 1K, Pro only)
-i, --input-image PATH Input image for editing mode
-r, --reference-images Reference image(s), can be repeated (max 14)
-v, --verbose Show detailed progress
| Model | Resolution | Best For | Nickname |
|-------|------------|----------|----------|
| gemini-3.1-flash-image-preview | Up to 4K | Latest model, fast + high quality | Nano Banana 2 |
| gemini-2.5-flash-image-preview | Up to 1K | Quick iterations, prototyping, batch generation | Nano Banana |
| gemini-3-pro-image-preview | Up to 4K | Previous-gen high quality fallback | Nano Banana Pro |
The gemini-3.1-flash-image-preview model is recommended as the default. Use gemini-2.5-flash-image-preview for faster/cheaper generation:
python scripts/generate_image.py -p "Quick concept sketch" -m gemini-2.5-flash-image-preview
API usage: All models use the same
google-genaiSDK withresponse_modalities=['Image']in the config. The models requireGEMINI_API_KEYand use thegenerate_contentendpoint (not a separate images endpoint).
| Ratio | Use Case |
|-------|----------|
| 1:1 | App icons, profile pictures, thumbnails |
| 2:3 | Portrait photos, book covers |
| 3:2 | Landscape photos, postcards |
| 3:4 | Portrait photos, social media posts |
| 4:3 | Traditional photos, presentations |
| 4:5 | Instagram posts, portrait social media |
| 5:4 | Large format prints |
| 9:16 | Stories, vertical videos, mobile wallpapers |
| 16:9 | Widescreen banners, video thumbnails, headers |
| 21:9 | Ultrawide banners, cinematic headers |
Available for Gemini 3 Pro model only:
| Size | Resolution | Use Case |
|------|------------|----------|
| 1K | 1024px | Web graphics, thumbnails |
| 2K | 2048px | Print materials, detailed graphics |
| 4K | 4096px | High-resolution prints, large displays |
python scripts/generate_image.py -p "Detailed landscape" -s 4K -o landscape_4k.png
Use this formula for effective prompts:
[Subject] + [Style] + [Details] + [Quality modifiers]
1. Be Specific About the Subject
Bad: "a cat"
Good: "a fluffy orange tabby cat sitting on a windowsill"
2. Specify Art Style
3. Include Environment and Lighting
4. Add Quality Modifiers
5. Specify Composition
App Icon
Minimalist app icon for a weather app, blue gradient background,
white cloud with golden sun rays, flat design, rounded corners,
iOS style, clean and modern
Marketing Banner
Professional tech startup banner, abstract geometric shapes
flowing from left to right, purple and blue gradient,
modern and clean aesthetic, corporate style
Game Sprite
Pixel art character sprite, fantasy warrior with glowing sword,
32x32 style, transparent background, retro 16-bit game aesthetic,
vibrant colors
Product Photo
Professional product photo of wireless earbuds on white background,
soft shadows, studio lighting, minimalist composition,
commercial photography style
Concept Art
Futuristic city skyline at sunset, flying vehicles between
towering skyscrapers, neon lights reflecting on wet streets,
cyberpunk atmosphere, cinematic composition, detailed
UI Mockup Asset
Abstract gradient background for mobile app, soft purple to pink
transition, subtle geometric patterns, modern and minimal,
suitable for dark text overlay
Generate images from text descriptions:
python scripts/generate_image.py -p "Your description here" -o output.png
Modify an existing image with a text prompt:
python scripts/generate_image.py \
-p "Change the background to a tropical beach at sunset" \
-i original.jpg \
-o edited.png
Use up to 14 reference images to guide style or content:
python scripts/generate_image.py \
-p "Create a new character in this art style" \
-r style_ref1.png \
-r style_ref2.png \
-o new_character.png
# iOS-style weather icon
python scripts/generate_image.py \
-p "Minimalist weather app icon, blue sky gradient, white fluffy cloud, sun peeking out, flat design, rounded square, iOS 17 style" \
-a 1:1 \
-o weather_icon.png
# Fitness app icon
python scripts/generate_image.py \
-p "Fitness app icon, running figure silhouette, orange to red gradient background, energetic and dynamic, modern flat design" \
-a 1:1 \
-o fitness_icon.png
# Website hero banner
python scripts/generate_image.py \
-p "Abstract tech hero banner, flowing data visualization, dark blue background with glowing cyan accents, futuristic and professional" \
-a 21:9 \
-s 2K \
-o hero_banner.png
# Social media post
python scripts/generate_image.py \
-p "Motivational quote background, soft sunrise gradient, minimalist mountain silhouette, peaceful and inspiring" \
-a 4:5 \
-o social_post_bg.png
# Character sprite
python scripts/generate_image.py \
-p "Pixel art hero character, knight with blue cape and silver armor, idle pose, transparent background, 16-bit retro style" \
-a 1:1 \
-o knight_sprite.png
# Environment tile
python scripts/generate_image.py \
-p "Grass tile for top-down RPG, seamless pattern, vibrant green with small flowers, pixel art style, 32x32 aesthetic" \
-a 1:1 \
-o grass_tile.png
# Change background
python scripts/generate_image.py \
-p "Replace background with a cozy coffee shop interior" \
-i portrait.jpg \
-o portrait_coffee_shop.png
# Style enhancement
python scripts/generate_image.py \
-p "Enhance with dramatic cinematic color grading, increase contrast, add film grain" \
-i landscape.jpg \
-o landscape_cinematic.png
Set your API key:
export GEMINI_API_KEY="your-api-key"
Wait a few minutes and try again. For batch operations, add delays between requests.
Modify your prompt to avoid content that violates Google's usage policies. Try:
The model sometimes returns text instead of an image. Try:
Supported formats for input images: PNG, JPEG, WebP
The size option (2K, 4K) is available for gemini-3.1-flash-image-preview and gemini-3-pro-image-preview. The gemini-2.5-flash-image-preview model generates up to 1024px images.
If you prefer to call the API directly without the CLI:
import os
from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
response = client.models.generate_content(
model="gemini-3.1-flash-image-preview",
contents="A minimalist app icon for a weather app, blue gradient, white cloud",
config=types.GenerateContentConfig(
response_modalities=["Image", "Text"],
# aspect_ratio and image_size options depend on model support
)
)
for part in response.candidates[0].content.parts:
if part.inline_data is not None:
image = Image.open(BytesIO(part.inline_data.data))
image.save("output.png")
print("Saved to output.png")
elif part.text:
print(part.text)
Important: The
generate_contentendpoint is used for image generation (not a separate images endpoint). Setresponse_modalitiesto include"Image"to enable image output.
gemini-2.5-flash-image-preview for speed, gemini-3.1-flash-image-preview for quality + 4Kdocumentation
Create or expand an Idea.md / IDEA.md file from a rough description, existing repo, conversation history, notes, or other early-stage product inputs. Use when the user asks to "write an Idea.md", "turn this into an idea file", "capture this product idea", "expand this concept", or wants a repo-grounded concept brief before validation, PRD, or implementation work.
development
Write structured implementation plans from specs or requirements before touching code. Use when given a spec, requirements doc, or feature description, when user says "plan this out", "write a plan for", "how should we implement", or before starting any multi-step coding task.
testing
Expert guidance for video editing with ffmpeg, encoding best practices, and quality optimization. Use when working with video files, transcoding, remuxing, encoding settings, color spaces, or troubleshooting video quality issues.
development
Opinionated constraints for building better interfaces with agents. Use when building UI components, implementing animations, designing layouts, reviewing frontend accessibility, or working with Tailwind CSS, motion/react, or accessible primitives like Radix/Base UI.