claude/skills/gemini-vision/SKILL.md
Guide for implementing Google Gemini API image understanding - analyze images with captioning, classification, visual QA, object detection, segmentation, and multi-image comparison. Use when analyzing images, answering visual questions, detecting objects, or processing documents with vision.
npx skillsauth add einverne/dotfiles gemini-visionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to use Google's Gemini API for advanced image understanding tasks including captioning, classification, visual question answering, object detection, segmentation, and multi-image analysis.
pip install google-genai (Python 3.9+)The skill checks for GEMINI_API_KEY in this order:
Process environment variable (recommended)
export GEMINI_API_KEY="your-api-key"
Skill directory: .claude/skills/gemini-vision/.env
GEMINI_API_KEY=your-api-key
Project directory: .env or .gemini_api_key in project root
Security: Never commit API keys to version control. Add .env to .gitignore.
# Analyze a local image
python scripts/analyze-image.py path/to/image.jpg "What's in this image?"
# Analyze from URL
python scripts/analyze-image.py https://example.com/image.jpg "Describe this"
# Specify model
python scripts/analyze-image.py image.jpg "Caption this" --model gemini-2.5-pro
python scripts/analyze-image.py image.jpg "Detect all objects" --model gemini-2.0-flash
python scripts/analyze-image.py img1.jpg img2.jpg "What's different between these?"
# Upload file
python scripts/upload-file.py path/to/large-image.jpg
# Use uploaded file
python scripts/analyze-image.py file://file-id "Caption this"
# List uploaded files
python scripts/manage-files.py list
# Get file info
python scripts/manage-files.py get file-id
# Delete file
python scripts/manage-files.py delete file-id
Images consume tokens based on size:
Token Formula:
crop_unit = floor(min(width, height) / 1.5)
tiles = (width / crop_unit) × (height / crop_unit)
total_tokens = tiles × 258
Example: 960×540 image = 6 tiles = 1,548 tokens
Limits vary by tier (Free, Tier 1, 2, 3):
Common errors:
See the references/ directory for:
When implementing Gemini vision features:
All scripts support the 3-step API key lookup:
Run any script with --help for detailed usage instructions.
Official Documentation: https://ai.google.dev/gemini-api/docs/image-understanding
development
生成符合项目规范的 React 组件。当用户要求创建组件、新建 React 组件或生成组件文件时使用
development
生成符合 Conventional Commits 规范的 Git 提交信息。当用户要求生成提交、创建 commit 或写提交信息时使用
devops
将当前分支部署到测试环境。当用户要求部署、发布到测试或在 staging 环境测试时使用
development
进行系统化的代码审查,检查代码质量、安全性和性能。当用户要求审查代码、review 或检查代码时使用