Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

einverne/gemini-image-gen

Name: gemini-image-gen
Author: einverne

claude/skills/gemini-image-gen/SKILL.md

npx skillsauth add einverne/dotfiles gemini-image-gen

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Gemini Image Generation Skill

Generate high-quality images using Google's Gemini 2.5 Flash Image model with text prompts, image editing, and multi-image composition capabilities.

When to Use This Skill

Use this skill when you need to:

Generate images from text descriptions
Edit existing images by adding/removing elements or changing styles
Combine multiple source images into new compositions
Iteratively refine images through conversational editing
Create visual content for documentation, design, or creative projects

Prerequisites

API Key Setup

The skill automatically detects your GEMINI_API_KEY in this order:

Process environment: export GEMINI_API_KEY="your-key"
Skill directory: .claude/skills/gemini-image-gen/.env
Project directory: ./.env (project root)

Get your API key: Visit Google AI Studio

Create .env file with:

GEMINI_API_KEY=your_api_key_here

Python Setup

Install required package:

pip install google-genai

Quick Start

Basic Text-to-Image Generation

from google import genai
from google.genai import types
import os

# API key detection handled automatically by helper script
client = genai.Client(api_key=os.getenv('GEMINI_API_KEY'))

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents='A serene mountain landscape at sunset with snow-capped peaks',
    config=types.GenerateContentConfig(
        response_modalities=['image'],
        aspect_ratio='16:9'
    )
)

# Save to ./docs/assets/
for i, part in enumerate(response.candidates[0].content.parts):
    if part.inline_data:
        with open(f'./docs/assets/generated-{i}.png', 'wb') as f:
            f.write(part.inline_data.data)

Using the Helper Script

For convenience, use the provided helper script that handles API key detection and file saving:

# Generate single image
python .claude/skills/gemini-image-gen/scripts/generate.py \
  "A futuristic city with flying cars" \
  --aspect-ratio 16:9 \
  --output ./docs/assets/city.png

# Generate with specific modalities
python .claude/skills/gemini-image-gen/scripts/generate.py \
  "Modern architecture design" \
  --response-modalities image text \
  --aspect-ratio 1:1

Key Features

Aspect Ratios

| Ratio | Resolution | Use Case | Token Cost | |-------|-----------|----------|------------| | 1:1 | 1024×1024 | Social media, avatars | 1290 | | 16:9 | 1344×768 | Landscapes, banners | 1290 | | 9:16 | 768×1344 | Mobile, portraits | 1290 | | 4:3 | 1152×896 | Traditional media | 1290 | | 3:4 | 896×1152 | Vertical posters | 1290 |

Response Modalities

['image']: Generate only images
['text']: Generate only text descriptions
['image', 'text']: Generate both images and descriptions

Image Editing

Provide existing image + text instructions to modify:

import PIL.Image

img = PIL.Image.open('original.png')
response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents=[
        'Add a red balloon floating in the sky',
        img
    ]
)

Multi-Image Composition

Combine up to 3 source images (recommended):

img1 = PIL.Image.open('background.png')
img2 = PIL.Image.open('foreground.png')

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents=[
        'Combine these images into a cohesive scene',
        img1,
        img2
    ]
)

Prompt Engineering Tips

Structure effective prompts with three elements:

Subject: What to generate ("a robot")
Context: Environmental setting ("in a futuristic city")
Style: Artistic treatment ("cyberpunk style, neon lighting")

Example: "A robot in a futuristic city, cyberpunk style with neon lighting and rain-slicked streets"

Quality modifiers:

Add terms like "4K", "HDR", "high-quality", "professional photography"
Specify camera settings: "35mm lens", "shallow depth of field", "golden hour lighting"

Text in images:

Limit to 25 characters maximum
Use up to 3 distinct phrases
Specify font styles: "bold sans-serif title" or "handwritten script"

See references/prompting-guide.md for comprehensive prompt engineering strategies.

Safety Settings

The model includes adjustable safety filters. Configure per-request:

config = types.GenerateContentConfig(
    response_modalities=['image'],
    safety_settings=[
        types.SafetySetting(
            category=types.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
            threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
        )
    ]
)

See references/safety-settings.md for detailed configuration options.

Output Management

All generated images should be saved to ./docs/assets/ directory:

# Create directory if needed
mkdir -p ./docs/assets

The helper script automatically saves to this location with timestamped filenames.

Model Specifications

Model: gemini-2.5-flash-image

Input tokens: Up to 65,536
Output tokens: Up to 32,768
Supported inputs: Text and images
Supported outputs: Text and images
Knowledge cutoff: June 2025
Features: Image generation, structured outputs, batch API, caching

Limitations

Maximum 3 input images recommended for best results
Text rendering works best when generated separately first
Does not support audio/video inputs
Regional restrictions on child image uploads (EEA, CH, UK)
Optimal language support: English, Spanish (Mexico), Japanese, Mandarin, Hindi

Error Handling

Common issues and solutions:

API key not found:

# Check environment variables
echo $GEMINI_API_KEY

# Verify .env file exists
cat .claude/skills/gemini-image-gen/.env
# or
cat .env

Safety filter blocking:

Review response.prompt_feedback.block_reason
Adjust safety settings if appropriate for your use case
Modify prompt to avoid triggering filters

Token limit exceeded:

Reduce prompt length
Use fewer input images
Simplify image editing instructions

Reference Documentation

For detailed information, see:

references/api-reference.md - Complete API specifications
references/prompting-guide.md - Advanced prompt engineering
references/safety-settings.md - Safety configuration details
references/code-examples.md - Additional implementation examples

Resources

Official Documentation
API Reference
Get API Key
Google AI Studio - Interactive testing

einverne/gemini-image-gen

claude/skills/gemini-image-gen/SKILL.md

Guide for implementing Google Gemini API image generation - create high-quality images from text prompts using gemini-2.5-flash-image model. Use when generating images, creating visual content, or implementing text-to-image features. Supports text-to-image, image editing, multi-image composition, and iterative refinement.

117 stars

development

Updated May 28, 2026

$ install --global

skillsauth

npx skillsauth add einverne/dotfiles gemini-image-gen

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 28, 2026, 3:31 AM115.7s9 files scanned

SKILL.md

name:: gemini-image-gen
description:: Guide for implementing Google Gemini API image generation - create high-quality images from text prompts using gemini-2.5-flash-image model. Use when generating images, creating visual content, or implementing text-to-image features. Supports text-to-image, image editing, multi-image composition, and iterative refinement.
license:: MIT
version:: 1.0.0

Gemini Image Generation Skill

Generate high-quality images using Google's Gemini 2.5 Flash Image model with text prompts, image editing, and multi-image composition capabilities.

When to Use This Skill

Use this skill when you need to:

Generate images from text descriptions
Edit existing images by adding/removing elements or changing styles
Combine multiple source images into new compositions
Iteratively refine images through conversational editing
Create visual content for documentation, design, or creative projects

Prerequisites

API Key Setup

The skill automatically detects your GEMINI_API_KEY in this order:

Process environment: export GEMINI_API_KEY="your-key"
Skill directory: .claude/skills/gemini-image-gen/.env
Project directory: ./.env (project root)

Get your API key: Visit Google AI Studio

Create .env file with:

GEMINI_API_KEY=your_api_key_here

Python Setup

Install required package:

pip install google-genai

Quick Start

Basic Text-to-Image Generation

from google import genai
from google.genai import types
import os

# API key detection handled automatically by helper script
client = genai.Client(api_key=os.getenv('GEMINI_API_KEY'))

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents='A serene mountain landscape at sunset with snow-capped peaks',
    config=types.GenerateContentConfig(
        response_modalities=['image'],
        aspect_ratio='16:9'
    )
)

# Save to ./docs/assets/
for i, part in enumerate(response.candidates[0].content.parts):
    if part.inline_data:
        with open(f'./docs/assets/generated-{i}.png', 'wb') as f:
            f.write(part.inline_data.data)

Using the Helper Script

For convenience, use the provided helper script that handles API key detection and file saving:

# Generate single image
python .claude/skills/gemini-image-gen/scripts/generate.py \
  "A futuristic city with flying cars" \
  --aspect-ratio 16:9 \
  --output ./docs/assets/city.png

# Generate with specific modalities
python .claude/skills/gemini-image-gen/scripts/generate.py \
  "Modern architecture design" \
  --response-modalities image text \
  --aspect-ratio 1:1

Key Features

Aspect Ratios

Response Modalities

['image']: Generate only images
['text']: Generate only text descriptions
['image', 'text']: Generate both images and descriptions

Image Editing

Provide existing image + text instructions to modify:

import PIL.Image

img = PIL.Image.open('original.png')
response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents=[
        'Add a red balloon floating in the sky',
        img
    ]
)

Multi-Image Composition

Combine up to 3 source images (recommended):

img1 = PIL.Image.open('background.png')
img2 = PIL.Image.open('foreground.png')

response = client.models.generate_content(
    model='gemini-2.5-flash-image',
    contents=[
        'Combine these images into a cohesive scene',
        img1,
        img2
    ]
)

Prompt Engineering Tips

Structure effective prompts with three elements:

Subject: What to generate ("a robot")
Context: Environmental setting ("in a futuristic city")
Style: Artistic treatment ("cyberpunk style, neon lighting")

Example: "A robot in a futuristic city, cyberpunk style with neon lighting and rain-slicked streets"

Quality modifiers:

Add terms like "4K", "HDR", "high-quality", "professional photography"
Specify camera settings: "35mm lens", "shallow depth of field", "golden hour lighting"

Text in images:

Limit to 25 characters maximum
Use up to 3 distinct phrases
Specify font styles: "bold sans-serif title" or "handwritten script"

See references/prompting-guide.md for comprehensive prompt engineering strategies.

Safety Settings

The model includes adjustable safety filters. Configure per-request:

config = types.GenerateContentConfig(
    response_modalities=['image'],
    safety_settings=[
        types.SafetySetting(
            category=types.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
            threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
        )
    ]
)

See references/safety-settings.md for detailed configuration options.

Output Management

All generated images should be saved to ./docs/assets/ directory:

# Create directory if needed
mkdir -p ./docs/assets

The helper script automatically saves to this location with timestamped filenames.

Model Specifications

Model: gemini-2.5-flash-image

Input tokens: Up to 65,536
Output tokens: Up to 32,768
Supported inputs: Text and images
Supported outputs: Text and images
Knowledge cutoff: June 2025
Features: Image generation, structured outputs, batch API, caching

Limitations

Maximum 3 input images recommended for best results
Text rendering works best when generated separately first
Does not support audio/video inputs
Regional restrictions on child image uploads (EEA, CH, UK)
Optimal language support: English, Spanish (Mexico), Japanese, Mandarin, Hindi

Error Handling

Common issues and solutions:

API key not found:

# Check environment variables
echo $GEMINI_API_KEY

# Verify .env file exists
cat .claude/skills/gemini-image-gen/.env
# or
cat .env

Safety filter blocking:

Review response.prompt_feedback.block_reason
Adjust safety settings if appropriate for your use case
Modify prompt to avoid triggering filters

Token limit exceeded:

Reduce prompt length
Use fewer input images
Simplify image editing instructions

Reference Documentation

For detailed information, see:

references/api-reference.md - Complete API specifications
references/prompting-guide.md - Advanced prompt engineering
references/safety-settings.md - Safety configuration details
references/code-examples.md - Additional implementation examples

Resources

Official Documentation
API Reference
Get API Key
Google AI Studio - Interactive testing

Related Skills

einverne/react-component-generator

development

VerifiedTrustedCommunity

生成符合项目规范的 React 组件。当用户要求创建组件、新建 React 组件或生成组件文件时使用

117SKILL.mdUpdated Apr 4, 2026

einverne/react-component-generator

einverne/git-commit-formatter

development

VerifiedTrustedCommunity

生成符合 Conventional Commits 规范的 Git 提交信息。当用户要求生成提交、创建 commit 或写提交信息时使用

117SKILL.mdUpdated Apr 4, 2026

einverne/git-commit-formatter

einverne/deploy-staging

devops

VerifiedTrustedCommunity

将当前分支部署到测试环境。当用户要求部署、发布到测试或在 staging 环境测试时使用

117SKILL.mdUpdated Apr 4, 2026

einverne/deploy-staging

einverne/code-reviewer

development

VerifiedTrustedCommunity

进行系统化的代码审查，检查代码质量、安全性和性能。当用户要求审查代码、review 或检查代码时使用

117SKILL.mdUpdated Apr 4, 2026

einverne/code-reviewer

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/einverne/dotfiles.git

# Copy into Claude Code skills folder (global)
cp -r dotfiles/claude/skills/gemini-image-gen ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

einverne/dotfiles

117 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT