skills/gemini-imagegen/SKILL.md
This skill should be used when generating and editing images using the Gemini API (Nano Banana Pro). It applies when creating images from text prompts, editing existing images, applying style transfers, generating logos with text, creating stickers, product mockups, or any image generation/manipulation task. Supports text-to-image, image editing, multi-turn refinement, and composition from multiple reference images.
npx skillsauth add marcusrbrown/systematic gemini-imagegenInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Generate and edit images using Google's Gemini API. The environment variable GEMINI_API_KEY must be set.
| Model | Resolution | Best For |
|-------|------------|----------|
| gemini-3-pro-image-preview | 1K-4K | All image generation (default) |
Note: Always use this Pro model. Only use a different model if explicitly requested.
gemini-3-pro-image-preview1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
1K (default), 2K, 4K
import os
from google import genai
from google.genai import types
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
# Basic generation (1K, 1:1 - defaults)
response = client.models.generate_content(
model="gemini-3-pro-image-preview",
contents=["Your prompt here"],
config=types.GenerateContentConfig(
response_modalities=['TEXT', 'IMAGE'],
),
)
for part in response.parts:
if part.text:
print(part.text)
elif part.inline_data:
image = part.as_image()
image.save("output.png")
from google.genai import types
response = client.models.generate_content(
model="gemini-3-pro-image-preview",
contents=[prompt],
config=types.GenerateContentConfig(
response_modalities=['TEXT', 'IMAGE'],
image_config=types.ImageConfig(
aspect_ratio="16:9", # Wide format
image_size="2K" # Higher resolution
),
)
)
# 1K (default) - Fast, good for previews
image_config=types.ImageConfig(image_size="1K")
# 2K - Balanced quality/speed
image_config=types.ImageConfig(image_size="2K")
# 4K - Maximum quality, slower
image_config=types.ImageConfig(image_size="4K")
# Square (default)
image_config=types.ImageConfig(aspect_ratio="1:1")
# Landscape wide
image_config=types.ImageConfig(aspect_ratio="16:9")
# Ultra-wide panoramic
image_config=types.ImageConfig(aspect_ratio="21:9")
# Portrait
image_config=types.ImageConfig(aspect_ratio="9:16")
# Photo standard
image_config=types.ImageConfig(aspect_ratio="4:3")
Pass existing images with text prompts:
from PIL import Image
img = Image.open("input.png")
response = client.models.generate_content(
model="gemini-3-pro-image-preview",
contents=["Add a sunset to this scene", img],
config=types.GenerateContentConfig(
response_modalities=['TEXT', 'IMAGE'],
),
)
Use chat for iterative editing:
from google.genai import types
chat = client.chats.create(
model="gemini-3-pro-image-preview",
config=types.GenerateContentConfig(response_modalities=['TEXT', 'IMAGE'])
)
response = chat.send_message("Create a logo for 'Acme Corp'")
# Save first image...
response = chat.send_message("Make the text bolder and add a blue gradient")
# Save refined image...
Include camera details: lens type, lighting, angle, mood.
"A photorealistic close-up portrait, 85mm lens, soft golden hour light, shallow depth of field"
Specify style explicitly:
"A kawaii-style sticker of a happy red panda, bold outlines, cel-shading, white background"
Be explicit about font style and placement:
"Create a logo with text 'Daily Grind' in clean sans-serif, black and white, coffee bean motif"
Describe lighting setup and surface:
"Studio-lit product photo on polished concrete, three-point softbox setup, 45-degree angle"
Generate images based on real-time data:
response = client.models.generate_content(
model="gemini-3-pro-image-preview",
contents=["Visualize today's weather in Tokyo as an infographic"],
config=types.GenerateContentConfig(
response_modalities=['TEXT', 'IMAGE'],
tools=[{"google_search": {}}]
)
)
Combine elements from multiple sources:
response = client.models.generate_content(
model="gemini-3-pro-image-preview",
contents=[
"Create a group photo of these people in an office",
Image.open("person1.png"),
Image.open("person2.png"),
Image.open("person3.png"),
],
config=types.GenerateContentConfig(
response_modalities=['TEXT', 'IMAGE'],
),
)
CRITICAL: The Gemini API returns images in JPEG format by default. When saving, always use .jpg extension to avoid media type mismatches.
# CORRECT - Use .jpg extension (Gemini returns JPEG)
image.save("output.jpg")
# WRONG - Will cause "Image does not match media type" errors
image.save("output.png") # Creates JPEG with PNG extension!
If you specifically need PNG format:
from PIL import Image
# Generate with Gemini
for part in response.parts:
if part.inline_data:
img = part.as_image()
# Convert to PNG by saving with explicit format
img.save("output.png", format="PNG")
Check actual format vs extension with the file command:
file image.png
# If output shows "JPEG image data" - rename to .jpg!
.jpg extensionresponseModalities: ["IMAGE"]) won't work with Google Search groundingtesting
Use when creating new skills, editing existing skills, or verifying skills work before deployment
development
Generate or regenerate ONBOARDING.md to help new contributors understand a codebase. Use when the user asks to 'create onboarding docs', 'generate ONBOARDING.md', 'document this project for new developers', 'write onboarding documentation', 'vonboard', 'vonboarding', 'prepare this repo for a new contributor', 'refresh the onboarding doc', or 'update ONBOARDING.md'. Also use when someone needs to onboard a new team member and wants a written artifact, or when a codebase lacks onboarding documentation and the user wants to generate one.
tools
Optimize Claude Code permissions by finding safe Bash commands from session history and auto-applying them to settings.json. Can run from any coding agent but targets Claude Code specifically. Use when experiencing permission fatigue, too many permission prompts, wanting to optimize permissions, or needing to set up allowlists. Triggers on "optimize permissions", "reduce permission prompts", "allowlist commands", "too many permission prompts", "permission fatigue", "permission setup", or complaints about clicking approve too often.
development
Use when reviewing pending todos for approval, prioritizing code review findings, or interactively categorizing work items