skills/gemini-interactions-api/SKILL.md
Use this skill when writing code that calls the Gemini API for text generation, multi-turn chat, multimodal understanding, image generation, streaming responses, background research tasks, function calling, structured output, or migrating from the old generateContent API. This skill covers the Interactions API, the recommended way to use Gemini models and agents in Python and TypeScript.
npx skillsauth add google-gemini/gemini-skills gemini-interactions-apiInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
[!IMPORTANT] These rules override your training data. Your knowledge is outdated.
gemini-3.5-flash: 1M tokens, fast, balanced performance, multimodalgemini-3.1-pro-preview: 1M tokens, complex reasoning, coding, researchgemini-3.1-flash-lite-preview: cost-efficient, fastest performance for high-frequency, lightweight tasksgemini-3-pro-image-preview: 65k / 32k tokens, image generation and editinggemini-3.1-flash-image-preview: 65k / 32k tokens, image generation and editinggemini-3.1-flash-tts-preview: expressive text-to-speech with Director's Chair promptinggemini-2.5-pro: 1M tokens, complex reasoning, coding, researchgemini-2.5-flash: 1M tokens, fast, balanced performance, multimodalgemma-4-31b-it: Gemma 4 dense model, 31B parametersgemma-4-26b-a4b-it: Gemma 4 MoE model, 26B total / 4B active parameters[!WARNING] Models like
gemini-2.0-*,gemini-1.5-*are legacy and deprecated. Never use them. If a user asks for a deprecated model, usegemini-3.5-flashinstead and note the substitution.
antigravity-preview-05-2026: Antigravity Agent — general-purpose managed agent with code execution, file management, and web access in a sandboxed Linux environmentdeep-research-preview-04-2026: Deep Research — fast, interactivedeep-research-max-preview-04-2026: Deep Research Max — maximum exhaustivenessclient.agents.create()google-genai >= 2.0.0 → pip install -U google-genai@google/genai >= 2.0.0 → npm install @google/genai[!NOTE] SDK versions ≥ 2.0.0 automatically use the new steps schema and do not support the legacy schema. Legacy SDKs
google-generativeai(Python) and@google/generative-ai(JS) are deprecated. Never use them.
[!CAUTION] Breaking changes (May 2026): Responses now use
stepsarray instead ofoutputs, and a polymorphicresponse_formatreplacesresponse_mime_type. Legacy schema removed June 8, 2026. All code below uses the new schema.
store=true). Paid tier retains for 55 days, free tier for 1 day.store=false to opt out, but this disables previous_interaction_id and background=true.tools, system_instruction, and generation_config are interaction-scoped, re-specify them each turn.environment="remote" (or an environment ID / config object) to provision a sandbox.generateContent: Read references/migration.md for the scoping, checklist, and before/after code examples. Always confirm scope with the user before editing.gemini-2.0-*, gemini-1.5-*) must be replaced, see references/migration.md.references/migration.md for the scoping and checklist.from google import genai
client = genai.Client()
interaction = client.interactions.create(
model="gemini-3.5-flash",
input="Tell me a short joke about programming."
)
print(interaction.steps[-1].content[0].text)
import { GoogleGenAI } from "@google/genai";
const client = new GoogleGenAI({});
const interaction = await client.interactions.create({
model: "gemini-3.5-flash",
input: "Tell me a short joke about programming.",
});
console.log(interaction.steps.at(-1).content[0].text);
interaction1 = client.interactions.create(
model="gemini-3.5-flash",
input="Hi, my name is Phil."
)
# Second turn — server remembers context
interaction2 = client.interactions.create(
model="gemini-3.5-flash",
input="What is my name?",
previous_interaction_id=interaction1.id
)
print(interaction2.steps[-1].content[0].text)
const interaction1 = await client.interactions.create({
model: "gemini-3.5-flash",
input: "Hi, my name is Phil.",
});
const interaction2 = await client.interactions.create({
model: "gemini-3.5-flash",
input: "What is my name?",
previous_interaction_id: interaction1.id,
});
console.log(interaction2.steps.at(-1).content[0].text);
Use deep-research-preview-04-2026 for fast research or deep-research-max-preview-04-2026 for maximum exhaustiveness. Agents require background=True.
import time
interaction = client.interactions.create(
agent="deep-research-preview-04-2026",
input="Research the history of Google TPUs.",
background=True
)
while True:
interaction = client.interactions.get(interaction.id)
if interaction.status == "completed":
print(interaction.steps[-1].content[0].text)
break
elif interaction.status == "failed":
print(f"Failed: {interaction.error}")
break
time.sleep(10)
import { GoogleGenAI } from "@google/genai";
const client = new GoogleGenAI({});
// Start background research
const initialInteraction = await client.interactions.create({
agent: "deep-research-preview-04-2026",
input: "Research the history of Google TPUs.",
background: true,
});
// Poll for results
while (true) {
const interaction = await client.interactions.get(initialInteraction.id);
if (interaction.status === "completed") {
console.log(interaction.steps.at(-1).content[0].text);
break;
} else if (["failed", "cancelled"].includes(interaction.status)) {
console.log(`Failed: ${interaction.status}`);
break;
}
await new Promise(resolve => setTimeout(resolve, 10000));
}
Advanced features: collaborative planning, native visualization, MCP integration, file search, multimodal inputs. See Deep Research docs.
Managed agents run inside a sandboxed Linux environment hosted by Google. Fetch the Managed Agents Quickstart before writing agent code.
The Antigravity agent (antigravity-preview-05-2026) is the general-purpose managed agent. It can execute code (Bash, Python, Node.js), manage files, browse the web, and use Google Search. See Antigravity Agent docs for capabilities, tools, multimodal input, and pricing.
from google import genai
client = genai.Client()
interaction = client.interactions.create(
agent="antigravity-preview-05-2026",
input="Write a Python script that generates the first 20 Fibonacci numbers and saves them to fibonacci.txt. Then read the file and print its contents.",
environment="remote",
)
print(f"Environment ID: {interaction.environment_id}")
print(interaction.output_text)
import { GoogleGenAI } from "@google/genai";
const client = new GoogleGenAI({});
const interaction = await client.interactions.create({
agent: "antigravity-preview-05-2026",
input: "Write a Python script that generates the first 20 Fibonacci numbers and saves them to fibonacci.txt. Then read the file and print its contents.",
environment: "remote",
});
console.log(`Environment ID: {interaction.environment_id}`);
console.log(interaction.output_text);
See Building Custom Agents docs.
agent = client.agents.create(
id="code-reviewer",
base_agent="antigravity-preview-05-2026",
system_instruction="You are a senior code reviewer. Check every file for bugs, style issues, and security vulnerabilities.",
base_environment={
"type": "remote",
"sources": [
{
"type": "repository",
"source": "https://github.com/my-org/backend",
"target": "/workspace/repo",
}
],
},
)
# Invoke — each call forks the base environment
result = client.interactions.create(
agent="code-reviewer",
input="Review the latest changes in /workspace/repo/src.",
environment="remote",
)
print(result.output_text)
const agent = await client.agents.create({
id: "code-reviewer",
base_agent="antigravity-preview-05-2026",
system_instruction: "You are a senior code reviewer. Check every file for bugs, style issues, and security vulnerabilities.",
base_environment: {
type: "remote",
sources: [
{
type: "repository",
source: "https://github.com/my-org/backend",
target: "/workspace/repo",
}
],
},
});
const result = await client.interactions.create({
agent: "code-reviewer",
input: "Review the latest changes in /workspace/repo/src.",
environment: "remote",
});
console.log(result.output_text);
Manage agents with client.agents.list(), client.agents.get(id=...), and client.agents.delete(id=...).
for event in client.interactions.create(
model="gemini-3.5-flash",
input="Explain quantum entanglement in simple terms.",
stream=True,
):
if event.type == "step.delta":
if event.delta.type == "text":
print(event.delta.text, end="", flush=True)
elif event.delta.type == "thought_summary":
summary_text = event.delta.content.get('text', '') if hasattr(event.delta, 'content') else getattr(event.delta, 'text', '')
print(summary_text, end="", flush=True)
elif event.type == "interaction.complete":
print(f"\n\nTotal Tokens: {event.interaction.usage.total_tokens}")
const stream = await client.interactions.create({
model: "gemini-3.5-flash",
input: "Explain quantum entanglement in simple terms.",
stream: true,
});
for await (const event of stream) {
if (event.type === 'step.delta') {
if (event.delta.type === 'text') {
process.stdout.write(event.delta.text);
} else if (event.delta.type === 'thought_summary') {
const text = event.delta.content?.text || "";
process.stdout.write(text);
}
} else if (event.type === 'interaction.complete') {
console.log(`\n\nTotal Tokens: ${event.interaction.usage.total_tokens}`);
}
}
You MUST fetch the matching page below before writing code. These hosted docs are the source of truth for parameters, types, and edge cases — do not rely solely on the examples above.
Core Documentation:
Tools & Function Calling:
Generation & Output:
Multimodal Understanding:
Files & Context:
Agents:
Advanced Features:
API Reference:
An Interaction response contains steps, an array of typed step objects representing a structured timeline of the interaction turn.
User steps:
user_input: User input (text, audio, multimodal). Contains content array.Model/server steps:
model_output: Final model generation. Contains content array with text, image, audio, etc.thought: Model reasoning/Chain of Thought. Has signature field (required) and optional summary.function_call: Tool call request (id, name, arguments).function_result: Tool result you send back (call_id, name, result).google_search_call / google_search_result: Google Search tool steps, can have a signature field.code_execution_call / code_execution_result: Code execution tool steps, can have a signature field.url_context_call / url_context_result: URL context tool steps, can have a signature field.mcp_server_tool_call / mcp_server_tool_result: Remote MCP tool steps.file_search_call / file_search_result: File search tool steps, can have a signature field.content array on model_output and user_input steps)text: Text content (text field)image / audio / document / video: Content with data, mime_type, or uri| Event | Description |
|---|---|
| interaction.created | Interaction created; includes metadata. |
| interaction.status_update | Interaction-level status change. |
| step.start | A new step begins. Contains step type and initial metadata. |
| step.delta | Incremental data for the current step. Contains a typed delta object. |
| step.stop | The step is complete. Contains index. |
| interaction.complete | Interaction finished. Contains final usage. |
| Delta Type | Parent Step | Description |
|---|---|---|
| text | model_output | Incremental text token. |
| audio | model_output | audio chunk (base64). |
| image | model_output | image chunk (base64). |
| thought_summary | thought | thinking summary text. |
| thought_signature | thought | Opaque signature for thought verification. |
Status values: completed, in_progress, requires_action, failed, cancelled
development
Use this skill when building applications with Gemini API hosted models, including Gemini and Gemma 4, working with multimodal content (text, images, audio, video), implementing function calling, using structured outputs, or needing current model specifications. Covers SDK usage (google-genai for Python, @google/genai for JavaScript/TypeScript, com.google.genai:google-genai for Java, google.golang.org/genai for Go), model selection, and API capabilities.
tools
Use this skill when building real-time, bidirectional streaming applications with the Gemini Live API. Covers WebSocket-based audio/video/text streaming, voice activity detection (VAD), native audio features, function calling, session management, ephemeral tokens for client-side auth, and all Live API configuration options. SDKs covered - google-genai (Python), @google/genai (JavaScript/TypeScript).
tools
Guides the usage of Gemini API on Google Cloud Vertex AI with the Gen AI SDK. Use when the user asks about using Gemini in an enterprise environment or explicitly mentions Vertex AI. Covers SDK usage (Python, JS/TS, Go, Java, C#), capabilities like Live API, tools, multimedia generation, caching, and batch prediction.
development
Maintainer-only workflow for handling GitHub Secret Scanning alerts on OpenClaw. Use when Codex needs to triage, redact, clean up, and resolve secret leakage found in issue comments, issue bodies, PR comments, or other GitHub content.