.claude/skills/gemini-3-pro-api/SKILL.md
Gemini 3 Pro API/SDK integration for text generation, reasoning, and chat. Covers setup, authentication, thinking levels, streaming, and production deployment. Use when working with Gemini 3 Pro API, Python SDK, Node.js SDK, text generation, chat applications, or advanced reasoning tasks.
npx skillsauth add adaptationio/skrillz gemini-3-pro-apiInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Comprehensive guide for integrating Google's Gemini 3 Pro API/SDK into your applications. Covers setup, authentication, text generation, advanced reasoning with dynamic thinking, chat applications, streaming responses, and production deployment patterns.
Gemini 3 Pro (gemini-3-pro-preview) is Google's most intelligent model designed for complex tasks requiring advanced reasoning and broad world knowledge. This skill provides complete workflows for API integration using Python or Node.js SDKs.
# Install SDK
pip install google-genai
# Basic usage
import google.generativeai as genai
genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel("gemini-3-pro-preview")
response = model.generate_content("Explain quantum computing")
print(response.text)
// Install SDK
npm install @google/generative-ai
// Basic usage
import { GoogleGenerativeAI } from "@google/generative-ai";
const genAI = new GoogleGenerativeAI("YOUR_API_KEY");
const model = genAI.getGenerativeModel({ model: "gemini-3-pro-preview" });
const result = await model.generateContent("Explain quantum computing");
console.log(result.response.text());
Goal: Get from zero to first successful API call in < 5 minutes.
Steps:
Get API Key
Install SDK
# Python
pip install google-genai
# Node.js
npm install @google/generative-ai
Configure Authentication
# Python - using environment variable (recommended)
import os
import google.generativeai as genai
genai.configure(api_key=os.getenv("GEMINI_API_KEY"))
// Node.js - using environment variable (recommended)
const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);
Make First API Call
# Python
model = genai.GenerativeModel("gemini-3-pro-preview")
response = model.generate_content("Write a haiku about coding")
print(response.text)
Verify Success
Expected Outcome: Working API integration in under 5 minutes.
Goal: Build a production-ready chat application with conversation history and streaming.
Steps:
Initialize Chat Model
# Python
model = genai.GenerativeModel(
"gemini-3-pro-preview",
generation_config={
"thinking_level": "high", # Dynamic reasoning
"temperature": 1.0, # Keep at 1.0 for best results
"max_output_tokens": 8192
}
)
Start Chat Session
chat = model.start_chat(history=[])
Send Message with Streaming
response = chat.send_message(
"Explain how neural networks learn",
stream=True
)
# Stream tokens in real-time
for chunk in response:
print(chunk.text, end="", flush=True)
Manage Conversation History
# History is automatically maintained
# Access it anytime
print(f"Conversation turns: {len(chat.history)}")
# Continue conversation
response = chat.send_message("Can you give an example?")
Handle Thought Signatures
references/thought-signatures.md for advanced casesImplement Error Handling
import time
from google.api_core import retry, exceptions
@retry.Retry(predicate=retry.if_exception_type(
exceptions.ResourceExhausted,
exceptions.ServiceUnavailable
))
def send_with_retry(chat, message):
return chat.send_message(message)
try:
response = send_with_retry(chat, user_input)
except exceptions.GoogleAPIError as e:
print(f"API error: {e}")
Expected Outcome: Production-ready chat application with streaming, history, and error handling.
Goal: Deploy Gemini 3 Pro integration with monitoring, cost control, and reliability.
Steps:
Setup Authentication (Production)
# Use environment variables (never hardcode keys)
import os
from pathlib import Path
# Option 1: Environment variable
api_key = os.getenv("GEMINI_API_KEY")
# Option 2: Secrets manager (recommended for production)
# Use Google Secret Manager, AWS Secrets Manager, etc.
Configure Production Settings
model = genai.GenerativeModel(
"gemini-3-pro-preview",
generation_config={
"thinking_level": "high", # or "low" for simple tasks
"temperature": 1.0, # CRITICAL: Keep at 1.0
"max_output_tokens": 4096,
"top_p": 0.95,
"top_k": 40
},
safety_settings={
# Configure content filtering as needed
}
)
Implement Comprehensive Error Handling
from google.api_core import exceptions, retry
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
def generate_with_fallback(prompt, max_retries=3):
@retry.Retry(
predicate=retry.if_exception_type(
exceptions.ResourceExhausted,
exceptions.ServiceUnavailable,
exceptions.DeadlineExceeded
),
initial=1.0,
maximum=10.0,
multiplier=2.0,
deadline=60.0
)
def _generate():
return model.generate_content(prompt)
try:
return _generate()
except exceptions.InvalidArgument as e:
logger.error(f"Invalid argument: {e}")
raise
except exceptions.PermissionDenied as e:
logger.error(f"Permission denied: {e}")
raise
except Exception as e:
logger.error(f"Unexpected error: {e}")
# Fallback to simpler model or cached response
return None
Monitor Usage and Costs
def log_usage(response):
usage = response.usage_metadata
logger.info(f"Tokens - Input: {usage.prompt_token_count}, "
f"Output: {usage.candidates_token_count}, "
f"Total: {usage.total_token_count}")
# Estimate cost (for prompts ≤200k tokens)
input_cost = (usage.prompt_token_count / 1_000_000) * 2.00
output_cost = (usage.candidates_token_count / 1_000_000) * 12.00
total_cost = input_cost + output_cost
logger.info(f"Estimated cost: ${total_cost:.6f}")
response = model.generate_content(prompt)
log_usage(response)
Implement Rate Limiting
import time
from collections import deque
class RateLimiter:
def __init__(self, max_requests_per_minute=60):
self.max_rpm = max_requests_per_minute
self.requests = deque()
def wait_if_needed(self):
now = time.time()
# Remove requests older than 1 minute
while self.requests and self.requests[0] < now - 60:
self.requests.popleft()
# Check if at limit
if len(self.requests) >= self.max_rpm:
sleep_time = 60 - (now - self.requests[0])
if sleep_time > 0:
time.sleep(sleep_time)
self.requests.append(now)
limiter = RateLimiter(max_requests_per_minute=60)
def generate_with_rate_limit(prompt):
limiter.wait_if_needed()
return model.generate_content(prompt)
Setup Logging and Monitoring
import logging
from datetime import datetime
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('gemini_api.log'),
logging.StreamHandler()
]
)
logger = logging.getLogger(__name__)
def monitored_generate(prompt):
start_time = datetime.now()
try:
response = model.generate_content(prompt)
duration = (datetime.now() - start_time).total_seconds()
logger.info(f"Success - Duration: {duration}s, "
f"Tokens: {response.usage_metadata.total_token_count}")
return response
except Exception as e:
duration = (datetime.now() - start_time).total_seconds()
logger.error(f"Failed - Duration: {duration}s, Error: {e}")
raise
Expected Outcome: Production-ready deployment with monitoring, cost control, error handling, and rate limiting.
Gemini 3 Pro introduces thinking_level to control reasoning depth:
thinking_level: "high" (default)
thinking_level: "low"
# Python
model = genai.GenerativeModel(
"gemini-3-pro-preview",
generation_config={
"thinking_level": "high" # or "low"
}
)
// Node.js
const model = genAI.getGenerativeModel({
model: "gemini-3-pro-preview",
generationConfig: {
thinking_level: "high" // or "low"
}
});
⚠️ Temperature MUST stay at 1.0 - Changing temperature can cause looping or degraded performance on complex reasoning tasks.
⚠️ Cannot combine thinking_level with legacy thinking_budget parameter.
See references/thinking-levels.md for detailed guide.
response = model.generate_content(
"Write a long article about AI",
stream=True
)
for chunk in response:
print(chunk.text, end="", flush=True)
const result = await model.generateContentStream("Write a long article about AI");
for await (const chunk of result.stream) {
process.stdout.write(chunk.text());
}
See references/streaming.md for advanced patterns.
| Context Size | Input | Output | |-------------|-------|--------| | ≤ 200k tokens | $2/1M | $12/1M | | > 200k tokens | $4/1M | $18/1M |
thinking_level: "low" for simple tasks (faster, lower cost)gemini-3-advanced skill)See references/best-practices.md for comprehensive cost optimization.
| Model | Context | Output | Input Price | Best For | |-------|---------|--------|-------------|----------| | gemini-3-pro-preview | 1M | 64k | $2-4/1M | Complex reasoning, coding | | gemini-1.5-pro | 1M | 8k | $7-14/1M | General use, multimodal | | gemini-1.5-flash | 1M | 8k | $0.35-0.70/1M | Simple tasks, cost-sensitive |
✅ Complex reasoning tasks ✅ Advanced coding problems ✅ Long-context analysis (up to 1M tokens) ✅ Large output requirements (up to 64k tokens) ✅ Tasks requiring dynamic thinking
| Error | Cause | Solution |
|-------|-------|----------|
| ResourceExhausted | Rate limit exceeded | Implement retry with backoff |
| InvalidArgument | Invalid parameters | Validate input, check docs |
| PermissionDenied | Invalid API key | Check authentication |
| DeadlineExceeded | Request timeout | Reduce context, retry |
from google.api_core import exceptions, retry
@retry.Retry(
predicate=retry.if_exception_type(
exceptions.ResourceExhausted,
exceptions.ServiceUnavailable
),
initial=1.0,
maximum=60.0,
multiplier=2.0
)
def safe_generate(prompt):
try:
return model.generate_content(prompt)
except exceptions.InvalidArgument as e:
logger.error(f"Invalid argument: {e}")
raise
except exceptions.PermissionDenied as e:
logger.error(f"Permission denied - check API key: {e}")
raise
except Exception as e:
logger.error(f"Unexpected error: {e}")
raise
See references/error-handling.md for comprehensive patterns.
Setup & Configuration
Features
Production
Official Resources
gemini-3-multimodal skillgemini-3-image-generation skillgemini-3-advanced skill (caching, tools, batch)gemini-3-multimodalgemini-3-image-generationSolution: Verify API key in Google AI Studio, check environment variable
Solution: Implement rate limiting, upgrade to paid tier, reduce request frequency
Solution: Use thinking_level: "low" for simple tasks, enable streaming, reduce context size
Solution: Keep prompts under 200k tokens, use appropriate thinking level, consider Gemini 1.5 Flash for simple tasks
Solution: Keep temperature at 1.0 (default) - do not modify for complex reasoning tasks
This skill provides everything needed to integrate Gemini 3 Pro API into your applications:
✅ Quick setup (< 5 minutes) ✅ Production-ready chat applications ✅ Dynamic thinking configuration ✅ Streaming responses ✅ Error handling and retry logic ✅ Cost optimization strategies ✅ Monitoring and logging patterns
For multimodal, image generation, and advanced features, see the companion skills.
Ready to build? Start with Workflow 1: Quick Start Setup above!
development
Setup secure web-based terminal access to WSL2 from mobile/tablet via ttyd + ngrok/Cloudflare/Tailscale. One-command install, start, stop, status. Use when you need remote terminal access, web terminal, browser-based shell, or mobile access to WSL2 environment.
development
Complete development workflows where Claude writes the code while Gemini and Codex provide research, planning, reviews, and different perspectives. Claude remains the main developer. Use for complex projects requiring expert planning and multi-perspective reviews.
development
Systematic progress tracking for skill development. Manages task states (pending/in_progress/completed), updates in real-time, reports progress, identifies blockers, and maintains momentum. Use when tracking skill development, coordinating work, or reporting progress.
testing
Comprehensive testing workflow orchestrating functional testing, example validation, integration testing, and usability assessment. Sequential workflow for complete skill testing from examples through scenarios to integration validation. Use when conducting thorough testing, pre-deployment validation, ensuring skill functionality, or comprehensive quality checks.