Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

kylesnowschwartz/sc-gemini-imagegen

Name: sc-gemini-imagegen
Author: kylesnowschwartz

plugins/sc-skills/skills/sc-gemini-imagegen/SKILL.md

npx skillsauth add kylesnowschwartz/simpleclaude sc-gemini-imagegen

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Gemini Image Generation

Generate and edit images using Google's Gemini API. The SDK reads GOOGLE_API_KEY by default (GEMINI_API_KEY as fallback). Or pass a key explicitly to genai.Client(api_key=...).

Models

| Model | Codename | Best For | |-------|----------|----------| | gemini-2.5-flash-image | Nano Banana | Most use cases, fast, good quality (default) | | gemini-3-pro-image-preview | Nano Banana Pro | High-res (2K/4K), Google Search grounding, precise text | | gemini-3.1-flash-image-preview | Nano Banana 2 | High volume, extended aspect ratios, 512 size |

Start with gemini-2.5-flash-image. Upgrade to Pro for high-res output or search grounding.

Quick Reference

Default Settings

Model: gemini-2.5-flash-image
Resolution: 1K (default)
Aspect Ratio: 1:1 (default)

Available Aspect Ratios

All models: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9

3.1 Flash only: 1:4, 4:1, 1:8, 8:1

Available Resolutions

All models: 1K (default), 2K, 4K

3.1 Flash only: 512

Core API Pattern

from google import genai
from google.genai import types

client = genai.Client()  # Reads GOOGLE_API_KEY (or GEMINI_API_KEY fallback)

response = client.models.generate_content(
    model="gemini-2.5-flash-image",
    contents="Your prompt here",
)

for part in response.parts:
    if part.text is not None:
        print(part.text)
    elif part.inline_data is not None:
        image = part.as_image()
        image.save("output.jpg")  # save() takes path only, writes raw bytes

Note: response_modalities is optional. Omit it to let the model decide. Set ['IMAGE'] for image-only output, or ['TEXT', 'IMAGE'] for interleaved text and images.

Custom Resolution & Aspect Ratio

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=prompt,
    config=types.GenerateContentConfig(
        image_config=types.ImageConfig(
            aspect_ratio="16:9",
            image_size="2K",
        ),
    ),
)

Editing Images (Chat Mode)

Chat mode is recommended for editing. The SDK handles thought signatures automatically across turns.

from PIL import Image

client = genai.Client()
image = Image.open("input.png")

chat = client.chats.create(model="gemini-2.5-flash-image")

# First edit
response = chat.send_message(["Add a sunset to this scene", image])

for i, part in enumerate(response.candidates[0].content.parts):
    if part.text is not None:
        print(part.text)
    elif part.inline_data is not None:
        image = part.as_image()
        image.save(f"edited_{i}.jpg")

# Continue refining
response = chat.send_message("Make the colors warmer")

PIL Image objects, base64 bytes, and file URIs (via client.files.upload()) all work as image inputs.

Google Search Grounding

Generate images informed by real-time data. Requires Pro model.

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents="Visualize today's weather in Tokyo as an infographic",
    config=types.GenerateContentConfig(
        image_config=types.ImageConfig(
            aspect_ratio="16:9",
            image_size="1K",
        ),
        tools=[types.Tool(google_search=types.GoogleSearch())],
    ),
)

Image search grounding (searching for reference images) is only available on gemini-3.1-flash-image-preview.

Multiple Reference Images

Combine elements from multiple sources. Pass PIL Image objects directly in the contents list.

from PIL import Image

response = client.models.generate_content(
    model="gemini-2.5-flash-image",
    contents=[
        "Create a group photo of these people in an office",
        Image.open("person1.png"),
        Image.open("person2.png"),
        Image.open("person3.png"),
    ],
)

Limits differ by model:

3.1 Flash: up to 10 object images + 4 character images (14 total)
3 Pro: up to 6 object images + 5 character images (11 total)

Prompting Best Practices

Photorealistic Scenes

Include camera details: lens type, lighting, angle, mood.

"A photorealistic close-up portrait, 85mm lens, soft golden hour light, shallow depth of field"

Stylized Art

Specify style explicitly:

"A kawaii-style sticker of a happy red panda, bold outlines, cel-shading, white background"

Text in Images

Be explicit about font style and placement:

"Create a logo with text 'Daily Grind' in clean sans-serif, black and white, coffee bean motif"

Product Mockups

Describe lighting setup and surface:

"Studio-lit product photo on polished concrete, three-point softbox setup, 45-degree angle"

File Format & Saving

The API returns JPEG in practice. image.save(path) writes raw bytes from the API response. It takes only a path string (no format kwarg).

# Save as-is (JPEG bytes from the API)
image.save("output.jpg")

To convert formats, use PIL on the raw bytes:

from PIL import Image
import io

for part in response.parts:
    if part.inline_data is not None:
        pil_img = Image.open(io.BytesIO(part.inline_data.data))
        pil_img.save("output.png")  # PIL handles the conversion

Notes

All generated images include SynthID watermarks (not configurable for Gemini models)
save(path) writes raw bytes; no format kwarg exists. Use PIL for format conversion
response_modalities is optional; omit to let the model decide output format
Multi-turn chat handles thought signatures automatically via the SDK
Editing via chat mode doesn't support image_config (only modality config)
For editing, describe changes conversationally; the model understands semantic masking
Default to 1K for speed; use 2K/4K when quality matters
person_generation parameter exists on ImageConfig for controlling person depiction in outputs

kylesnowschwartz/sc-gemini-imagegen

plugins/sc-skills/skills/sc-gemini-imagegen/SKILL.md

Generate and edit images using the Gemini API (Nano Banana). This skill SHOULD be used when creating images from text prompts, editing existing images, applying style transfers, generating logos with text, creating stickers, product mockups, or any image generation/manipulation task. Supports text-to-image, image editing, multi-turn refinement, and composition from multiple reference images.

97 stars

development

Updated Apr 16, 2026

$ install --global

skillsauth

npx skillsauth add kylesnowschwartz/simpleclaude sc-gemini-imagegen

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 24, 2026, 8:52 PM1.8s1 file scanned

SKILL.md

name:: sc-gemini-imagegen
description:: Generate and edit images using the Gemini API (Nano Banana). This skill SHOULD be used when creating images from text prompts, editing existing images, applying style transfers, generating logos with text, creating stickers, product mockups, or any image generation/manipulation task. Supports text-to-image, image editing, multi-turn refinement, and composition from multiple reference images.

Gemini Image Generation

Generate and edit images using Google's Gemini API. The SDK reads GOOGLE_API_KEY by default (GEMINI_API_KEY as fallback). Or pass a key explicitly to genai.Client(api_key=...).

Models

Start with gemini-2.5-flash-image. Upgrade to Pro for high-res output or search grounding.

Quick Reference

Default Settings

Model: gemini-2.5-flash-image
Resolution: 1K (default)
Aspect Ratio: 1:1 (default)

Available Aspect Ratios

All models: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9

3.1 Flash only: 1:4, 4:1, 1:8, 8:1

Available Resolutions

All models: 1K (default), 2K, 4K

3.1 Flash only: 512

Core API Pattern

from google import genai
from google.genai import types

client = genai.Client()  # Reads GOOGLE_API_KEY (or GEMINI_API_KEY fallback)

response = client.models.generate_content(
    model="gemini-2.5-flash-image",
    contents="Your prompt here",
)

for part in response.parts:
    if part.text is not None:
        print(part.text)
    elif part.inline_data is not None:
        image = part.as_image()
        image.save("output.jpg")  # save() takes path only, writes raw bytes

Note: response_modalities is optional. Omit it to let the model decide. Set ['IMAGE'] for image-only output, or ['TEXT', 'IMAGE'] for interleaved text and images.

Custom Resolution & Aspect Ratio

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents=prompt,
    config=types.GenerateContentConfig(
        image_config=types.ImageConfig(
            aspect_ratio="16:9",
            image_size="2K",
        ),
    ),
)

Editing Images (Chat Mode)

Chat mode is recommended for editing. The SDK handles thought signatures automatically across turns.

from PIL import Image

client = genai.Client()
image = Image.open("input.png")

chat = client.chats.create(model="gemini-2.5-flash-image")

# First edit
response = chat.send_message(["Add a sunset to this scene", image])

for i, part in enumerate(response.candidates[0].content.parts):
    if part.text is not None:
        print(part.text)
    elif part.inline_data is not None:
        image = part.as_image()
        image.save(f"edited_{i}.jpg")

# Continue refining
response = chat.send_message("Make the colors warmer")

PIL Image objects, base64 bytes, and file URIs (via client.files.upload()) all work as image inputs.

Google Search Grounding

Generate images informed by real-time data. Requires Pro model.

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents="Visualize today's weather in Tokyo as an infographic",
    config=types.GenerateContentConfig(
        image_config=types.ImageConfig(
            aspect_ratio="16:9",
            image_size="1K",
        ),
        tools=[types.Tool(google_search=types.GoogleSearch())],
    ),
)

Image search grounding (searching for reference images) is only available on gemini-3.1-flash-image-preview.

Multiple Reference Images

Combine elements from multiple sources. Pass PIL Image objects directly in the contents list.

from PIL import Image

response = client.models.generate_content(
    model="gemini-2.5-flash-image",
    contents=[
        "Create a group photo of these people in an office",
        Image.open("person1.png"),
        Image.open("person2.png"),
        Image.open("person3.png"),
    ],
)

Limits differ by model:

3.1 Flash: up to 10 object images + 4 character images (14 total)
3 Pro: up to 6 object images + 5 character images (11 total)

Prompting Best Practices

Photorealistic Scenes

Include camera details: lens type, lighting, angle, mood.

"A photorealistic close-up portrait, 85mm lens, soft golden hour light, shallow depth of field"

Stylized Art

Specify style explicitly:

"A kawaii-style sticker of a happy red panda, bold outlines, cel-shading, white background"

Text in Images

Be explicit about font style and placement:

"Create a logo with text 'Daily Grind' in clean sans-serif, black and white, coffee bean motif"

Product Mockups

Describe lighting setup and surface:

"Studio-lit product photo on polished concrete, three-point softbox setup, 45-degree angle"

File Format & Saving

The API returns JPEG in practice. image.save(path) writes raw bytes from the API response. It takes only a path string (no format kwarg).

# Save as-is (JPEG bytes from the API)
image.save("output.jpg")

To convert formats, use PIL on the raw bytes:

from PIL import Image
import io

for part in response.parts:
    if part.inline_data is not None:
        pil_img = Image.open(io.BytesIO(part.inline_data.data))
        pil_img.save("output.png")  # PIL handles the conversion

Notes

All generated images include SynthID watermarks (not configurable for Gemini models)
save(path) writes raw bytes; no format kwarg exists. Use PIL for format conversion
response_modalities is optional; omit to let the model decide output format
Multi-turn chat handles thought signatures automatically via the SDK
Editing via chat mode doesn't support image_config (only modality config)
For editing, describe changes conversationally; the model understands semantic masking
Default to 1K for speed; use 2K/4K when quality matters
person_generation parameter exists on ImageConfig for controlling person depiction in outputs

Related Skills

kylesnowschwartz/human-writing

testing

VerifiedTrustedCommunity

Use when writing or editing a Slack message, email, pull request body, GitHub issue, Reddit post, agenda, or doc. Enforces a direct, warm, unfilled tone and removes AI tells. Always scores the final draft and runs a self-audit pass before delivery.

101SKILL.mdUpdated May 5, 2026

kylesnowschwartz/human-writing

kylesnowschwartz/sc-think-functional

development

VerifiedTrustedCommunity

Reframe code design through functional programming principles for agent-assisted development. This skill SHOULD be used when the user says "think functional", "think FP", "make this pure", "separate effects", "where should this side effect go", "this function does too much", "how should I structure this for agents", "make this easier to review", "reduce context needed", or when planning module structure, store design, or code that agents will write and humans will review. Applies FP discipline within any language to maximize agent effectiveness and human reviewability.

101SKILL.mdUpdated Apr 16, 2026

kylesnowschwartz/sc-think-functional

kylesnowschwartz/sc-socratic

development

VerifiedTrustedCommunity

Use when the user is clarifying beliefs, assumptions, goals, or framing before committing to decision or plan.

101SKILL.mdUpdated Apr 16, 2026

kylesnowschwartz/sc-socratic

kylesnowschwartz/sc-claude-md-improver

testing

VerifiedTrustedCommunity

Audit and improve project memory files (CLAUDE.md, AGENTS.md, .claude.local.md) — assess against a quality rubric, then apply additions and removals. This skill SHOULD be used when the user asks to audit, improve, edit, fix, tighten, rewrite, or update a memory file, or to check whether one is too long, stale, or bloated.

101SKILL.mdUpdated Apr 16, 2026

kylesnowschwartz/sc-claude-md-improver

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/kylesnowschwartz/simpleclaude.git

# Copy into Claude Code skills folder (global)
cp -r simpleclaude/plugins/sc-skills/skills/sc-gemini-imagegen ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

kylesnowschwartz/simpleclaude

97 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT