plugins/sc-skills/skills/sc-gemini-imagegen/SKILL.md
Generate and edit images using the Gemini API (Nano Banana). This skill SHOULD be used when creating images from text prompts, editing existing images, applying style transfers, generating logos with text, creating stickers, product mockups, or any image generation/manipulation task. Supports text-to-image, image editing, multi-turn refinement, and composition from multiple reference images.
npx skillsauth add kylesnowschwartz/simpleclaude sc-gemini-imagegenInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Generate and edit images using Google's Gemini API. The SDK reads GOOGLE_API_KEY by default (GEMINI_API_KEY as fallback). Or pass a key explicitly to genai.Client(api_key=...).
| Model | Codename | Best For |
|-------|----------|----------|
| gemini-2.5-flash-image | Nano Banana | Most use cases, fast, good quality (default) |
| gemini-3-pro-image-preview | Nano Banana Pro | High-res (2K/4K), Google Search grounding, precise text |
| gemini-3.1-flash-image-preview | Nano Banana 2 | High volume, extended aspect ratios, 512 size |
Start with gemini-2.5-flash-image. Upgrade to Pro for high-res output or search grounding.
gemini-2.5-flash-imageAll models: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
3.1 Flash only: 1:4, 4:1, 1:8, 8:1
All models: 1K (default), 2K, 4K
3.1 Flash only: 512
from google import genai
from google.genai import types
client = genai.Client() # Reads GOOGLE_API_KEY (or GEMINI_API_KEY fallback)
response = client.models.generate_content(
model="gemini-2.5-flash-image",
contents="Your prompt here",
)
for part in response.parts:
if part.text is not None:
print(part.text)
elif part.inline_data is not None:
image = part.as_image()
image.save("output.jpg") # save() takes path only, writes raw bytes
Note: response_modalities is optional. Omit it to let the model decide. Set ['IMAGE'] for image-only output, or ['TEXT', 'IMAGE'] for interleaved text and images.
response = client.models.generate_content(
model="gemini-3-pro-image-preview",
contents=prompt,
config=types.GenerateContentConfig(
image_config=types.ImageConfig(
aspect_ratio="16:9",
image_size="2K",
),
),
)
Chat mode is recommended for editing. The SDK handles thought signatures automatically across turns.
from PIL import Image
client = genai.Client()
image = Image.open("input.png")
chat = client.chats.create(model="gemini-2.5-flash-image")
# First edit
response = chat.send_message(["Add a sunset to this scene", image])
for i, part in enumerate(response.candidates[0].content.parts):
if part.text is not None:
print(part.text)
elif part.inline_data is not None:
image = part.as_image()
image.save(f"edited_{i}.jpg")
# Continue refining
response = chat.send_message("Make the colors warmer")
PIL Image objects, base64 bytes, and file URIs (via client.files.upload()) all work as image inputs.
Generate images informed by real-time data. Requires Pro model.
response = client.models.generate_content(
model="gemini-3-pro-image-preview",
contents="Visualize today's weather in Tokyo as an infographic",
config=types.GenerateContentConfig(
image_config=types.ImageConfig(
aspect_ratio="16:9",
image_size="1K",
),
tools=[types.Tool(google_search=types.GoogleSearch())],
),
)
Image search grounding (searching for reference images) is only available on gemini-3.1-flash-image-preview.
Combine elements from multiple sources. Pass PIL Image objects directly in the contents list.
from PIL import Image
response = client.models.generate_content(
model="gemini-2.5-flash-image",
contents=[
"Create a group photo of these people in an office",
Image.open("person1.png"),
Image.open("person2.png"),
Image.open("person3.png"),
],
)
Limits differ by model:
Include camera details: lens type, lighting, angle, mood.
"A photorealistic close-up portrait, 85mm lens, soft golden hour light, shallow depth of field"
Specify style explicitly:
"A kawaii-style sticker of a happy red panda, bold outlines, cel-shading, white background"
Be explicit about font style and placement:
"Create a logo with text 'Daily Grind' in clean sans-serif, black and white, coffee bean motif"
Describe lighting setup and surface:
"Studio-lit product photo on polished concrete, three-point softbox setup, 45-degree angle"
The API returns JPEG in practice. image.save(path) writes raw bytes from the API response. It takes only a path string (no format kwarg).
# Save as-is (JPEG bytes from the API)
image.save("output.jpg")
To convert formats, use PIL on the raw bytes:
from PIL import Image
import io
for part in response.parts:
if part.inline_data is not None:
pil_img = Image.open(io.BytesIO(part.inline_data.data))
pil_img.save("output.png") # PIL handles the conversion
save(path) writes raw bytes; no format kwarg exists. Use PIL for format conversionresponse_modalities is optional; omit to let the model decide output formatimage_config (only modality config)person_generation parameter exists on ImageConfig for controlling person depiction in outputstesting
Use when writing or editing a Slack message, email, pull request body, GitHub issue, Reddit post, agenda, or doc. Enforces a direct, warm, unfilled tone and removes AI tells. Always scores the final draft and runs a self-audit pass before delivery.
development
Reframe code design through functional programming principles for agent-assisted development. This skill SHOULD be used when the user says "think functional", "think FP", "make this pure", "separate effects", "where should this side effect go", "this function does too much", "how should I structure this for agents", "make this easier to review", "reduce context needed", or when planning module structure, store design, or code that agents will write and humans will review. Applies FP discipline within any language to maximize agent effectiveness and human reviewability.
development
Use when the user is clarifying beliefs, assumptions, goals, or framing before committing to decision or plan.
testing
Audit and improve project memory files (CLAUDE.md, AGENTS.md, .claude.local.md) — assess against a quality rubric, then apply additions and removals. This skill SHOULD be used when the user asks to audit, improve, edit, fix, tighten, rewrite, or update a memory file, or to check whether one is too long, stale, or bloated.