invoking-gemini/SKILL.md
Invokes Google Gemini models for structured outputs, image generation, multi-modal tasks, and Google-specific features. Use when users request Gemini, image generation, structured JSON output, Google API integration, or cost-effective parallel processing.
npx skillsauth add oaustegard/claude-skills invoking-geminiInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Delegate tasks to Google's Gemini models when they offer advantages over Claude.
Image generation:
Structured outputs:
Cost optimization:
Multi-modal tasks:
uv pip install requests pydantic
Credentials — Option A (recommended): Cloudflare AI Gateway
Source /mnt/project/proxy.env with CF_ACCOUNT_ID, CF_GATEWAY_ID, CF_API_TOKEN.
Requests route through Cloudflare AI Gateway, bypassing IP blocks. Google API key stored in gateway via BYOK.
Credentials — Option B: Direct Google API
If no proxy.env, falls back to direct: GOOGLE_API_KEY.txt or API_CREDENTIALS.json.
Generate images using Gemini's native image models. This is the primary way to create illustrations, blog headers, diagrams, and visual content.
import sys
sys.path.append('/mnt/skills/user/invoking-gemini/scripts')
from gemini_client import generate_image
# One call — returns {"path": "...", "caption": "..."} or None
result = generate_image("A watercolor painting of a mountain lake at sunset")
print(result["path"]) # /mnt/user-data/outputs/gemini_image_1740000000.png
generate_image(
prompt: str, # The image description
output_path: str = None, # Auto-generates if omitted
model: str = "nano-banana-2", # Default: fast. Use "image-pro" for quality
temperature: float = 0.7, # 0.5-0.7 for diagrams, 0.7-0.8 for illustrations
) -> dict | None
# Returns: {"path": "/mnt/user-data/outputs/gemini_image_*.png", "caption": str|None}
# Returns None on failure
| Alias | Model | Best For | Cost/image |
|-------|-------|----------|------------|
| "nano-banana-2" or "image" | gemini-3.1-flash-image-preview | Fast iteration, drafts | $0.067 |
| "image-pro" or "nano-banana-pro" | gemini-3-pro-image-preview | Published content, text rendering | $0.134 |
import sys
sys.path.append('/mnt/skills/user/invoking-gemini/scripts')
from gemini_client import generate_image
# 1. Compose prompt with style prefix + subject
style_prefix = (
"Style: Risograph-inspired editorial illustration. "
"Visible halftone dot texture and slight color misregistration between layers. "
"Limited ink palette: deep indigo, warm coral, and sage green on off-white paper. "
"Layered transparency where colors overlap creates rich secondary tones. "
"Modern and professional — the aesthetic of an indie design studio, not a fantasy novel. "
"Generous whitespace. No photorealism, no glow effects, no cyberpunk. No text or labels."
)
subject = "A raven perched on a stack of books, observing a network graph"
prompt = f"{style_prefix}\n\nSubject: {subject}. Wide landscape format, suitable as a blog header."
# 2. Generate (use image-pro for published content)
result = generate_image(prompt, model="image-pro", temperature=0.75)
if result:
print(f"Saved: {result['path']}")
# 3. Present to user
# present_files([result["path"]])
result = generate_image(
"A logo for a coffee shop called 'Bean There'",
output_path="/mnt/user-data/outputs/coffee_logo.png"
)
import sys
sys.path.append('/mnt/skills/user/invoking-gemini/scripts')
from gemini_client import invoke_gemini
response = invoke_gemini(
prompt="Explain quantum computing in 3 bullet points",
model="flash", # gemini-3.5-flash (default)
)
print(response)
Use Pydantic models for guaranteed JSON Schema compliance:
from gemini_client import invoke_with_structured_output
from pydantic import BaseModel, Field
class BookAnalysis(BaseModel):
title: str
genre: str = Field(description="Primary genre")
key_themes: list[str] = Field(max_length=5)
rating: int = Field(ge=1, le=5)
result = invoke_with_structured_output(
prompt="Analyze the book '1984' by George Orwell",
pydantic_model=BookAnalysis
)
print(result.title) # "1984"
from gemini_client import invoke_parallel
results = invoke_parallel(
prompts=["Summarize Hamlet", "Summarize Macbeth", "Summarize Othello"],
model="lite", # gemini-2.5-flash-lite — cheapest, fastest for batch
)
All current Gemini 3.x text/multimodal models are in preview except 3.5
Flash (GA May 19, 2026). Use the values below — gemini-3-flash-preview
and gemini-3.1-flash-lite-preview from earlier docs are out of date.
| Model | Alias | Input/1M | Output/1M | Context | Notes |
|-------|-------|----------|-----------|---------|-------|
| gemini-3.5-flash | flash | $1.50 | $9.00 | 1M | GA May 2026. Frontier Flash. Beats 3.1 Pro on most coding/agentic benchmarks. Default thinking_level=medium — set minimal for non-reasoning tasks. |
| gemini-3-flash-preview | flash-3 | $0.30 | $2.50 | 1M | Prior-gen Flash, kept for back compat |
| gemini-3.1-pro-preview | pro | $2.00 (≤200K) / $4.00 | $12.00 / $24.00 | 1M | Current Pro tier; 3.5 Pro slated for June 2026 |
| gemini-2.5-flash | stable-flash | $0.30 | $2.50 | 1M | Stable production Flash |
| gemini-2.5-flash-lite | lite | $0.10 | $0.40 | 1M | Cheapest major-provider production model. Surprisingly strong on multimodal extraction. |
| gemini-2.5-pro | stable-pro | $1.25 (≤200K) / $2.50 | $10.00 / $20.00 | 1M | Stable production Pro |
| Model | Alias | Input/1M | Per Image |
|-------|-------|----------|-----------|
| gemini-3.1-flash-image-preview | image, nano-banana-2 | $0.25 | $0.067 |
| gemini-3-pro-image-preview | image-pro, nano-banana-pro | $2.00 | $0.134 |
See references/models.md for full details.
Gemini 3.x models reason before responding. The parameter changed in
2026: integer thinking_budget is gone; use string thinking_level
∈ {minimal, low, medium, high}. Default for 3.5 Flash is
medium. For transcription / classification / extraction tasks, pass
thinking_level='minimal' or the model will silently spend output
tokens on reasoning (symptom: empty response with
finishReason=MAX_TOKENS).
response = invoke_gemini(
prompt="Transcribe this image.",
model="flash",
image_path="/tmp/screenshot.png",
max_output_tokens=4000,
thinking_level="minimal", # don't burn output budget on reasoning
)
response = invoke_gemini(prompt="...", model="flash")
if response is None:
print("API call failed — check credentials")
result = generate_image("...")
if result is None:
print("Image generation failed — check credentials or try again")
Common issues: Missing API key → see Setup. Rate limit → auto-retries with backoff. Network error → returns None.
response = invoke_gemini(
prompt="Write a haiku",
model="flash", # gemini-3.5-flash
temperature=0.9,
max_output_tokens=200,
top_p=0.95,
thinking_level="low", # haiku is short; modest reasoning is fine
)
from pydantic import BaseModel
from gemini_client import invoke_with_structured_output
class ImageDescription(BaseModel):
objects: list[str]
scene: str
colors: list[str]
result = invoke_with_structured_output(
prompt="Describe this image",
pydantic_model=ImageDescription,
image_path="/mnt/user-data/uploads/photo.jpg"
)
See references/advanced.md for more patterns.
"No credentials configured": Create /mnt/project/proxy.env with CF credentials, or add GOOGLE_API_KEY.txt.
CF Gateway 401/403: Verify CF_API_TOKEN has AI Gateway permissions. If not using BYOK, add GOOGLE_API_KEY to proxy.env.
Import errors: uv pip install requests pydantic
Image generation returns None: Check credentials. If persistent, try model="nano-banana-2" (more reliable than image-pro). Check for content policy blocks in error output.
development
--- name: verifying-claims description: Check that a document's claims about code are actually true by reading the prose, the code, and the tests and reporting (or fixing) where they disagree. Use whenever the user wants to verify a README, guide, spec, or docstring still matches the code; whenever they mention documentation drift, doc-code sync, "is this still accurate", stale docs, or keeping docs/tests/code consistent; before publishing or merging a docs change; or as a periodic doc-accuracy
tools
Query, filter, and transform Markdown structurally with mq — a jq-like CLI for Markdown. Use to extract headings/sections/code-blocks/links from .md files, build a table of contents, pull code blocks of a given language, slice or reshape LLM prompt/output Markdown, or batch-transform docs. Triggers on "extract sections from this markdown", "get all the code blocks", "jq for markdown", "mq", or any structural query over Markdown that grep/Read can't do cleanly.
development
Composes single-file HTML artifacts (PR review writeups, status reports, incident postmortems, slide decks, design systems, prototypes, flowcharts, module maps, feature explainers, kanban boards, prompt tuners) from a small JSON spec instead of hand-written HTML/CSS/JS. Use when the user asks to "compare options side-by-side", requests an HTML version of a report or review or deck, asks for a flowchart, status update, postmortem, design system reference, interactive prototype, custom editor — or explicitly says "HTML artifact", "single HTML file", "self-contained HTML". Skip for ad-hoc HTML snippets (forms, emails, embedded widgets) where there's no template fit.
development
DAG workflow runner that encodes control flow in code, not prose. Use when a procedure has 3+ steps with branching, retries, or validation that must be enforced — gates as `when=`, edge contracts as `validate=`, predicate loops as `retry_until=`. The runner owns the graph; the LLM provides leaves. Also covers parallel execution, checkpoint resume, detached side-effects.