.claude/skills/multimodal-router/SKILL.md
# Multimodal Router ## When to Load This Skill Load when working with: PDF files, Word documents, Excel spreadsheets, images, audio, video files, or any document exceeding 400k tokens that cannot fit in Claude's standard context. ## Model - **Model**: `google/gemini-3-flash-preview` - **Provider**: OpenRouter API - **Context window**: 1M tokens - **Capabilities**: text, images, audio, video, PDF — all natively - **Thinking levels**: minimal / low / medium / high (configurable per task) Gemi
npx skillsauth add pyramidheadshark/ml-claude-infra .claude/skills/multimodal-routerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Load when working with: PDF files, Word documents, Excel spreadsheets, images, audio, video files, or any document exceeding 400k tokens that cannot fit in Claude's standard context.
google/gemini-3-flash-previewGemini 3 Flash Preview is a thinking model with near-Pro reasoning at Flash latency.
Use thinking_level: "low" for document extraction, "medium" or "high" for complex analysis.
Use Gemini 3 Flash via this skill when:
.docx, .pdf, .xlsx, .mp4, .wav for initial project analysisDo NOT use for: writing code, architecture decisions, tests. Those stay with Claude Code.
import httpx
from src.project_name.core.config import settings
OPENROUTER_BASE_URL = "https://openrouter.ai/api/v1"
MULTIMODAL_MODEL = "google/gemini-3-flash-preview"
async def call_gemini_flash(
prompt: str,
base64_content: str | None = None,
media_type: str | None = None,
thinking_level: str = "low",
) -> str:
messages: list[dict] = []
if base64_content and media_type:
messages.append({
"role": "user",
"content": [
{
"type": "image_url" if media_type.startswith("image") else "file",
"image_url": {"url": f"data:{media_type};base64,{base64_content}"},
},
{"type": "text", "text": prompt},
],
})
else:
messages.append({"role": "user", "content": prompt})
payload = {
"model": MULTIMODAL_MODEL,
"messages": messages,
"reasoning": {"effort": thinking_level},
"max_tokens": 4096,
}
async with httpx.AsyncClient(timeout=120.0) as client:
response = await client.post(
f"{OPENROUTER_BASE_URL}/chat/completions",
headers={
"Authorization": f"Bearer {settings.openrouter_api_key}",
"HTTP-Referer": "https://github.com/your-org/project",
"X-Title": "ML Engineering Platform",
},
json=payload,
)
response.raise_for_status()
data = response.json()
return data["choices"][0]["message"]["content"]
import base64
from pathlib import Path
async def analyze_pdf(pdf_path: Path, analysis_prompt: str) -> str:
pdf_bytes = pdf_path.read_bytes()
b64 = base64.b64encode(pdf_bytes).decode("utf-8")
return await call_gemini_flash(
prompt=analysis_prompt,
base64_content=b64,
media_type="application/pdf",
thinking_level="medium",
)
For intake phase (analyzing client documents):
INTAKE_SYSTEM_PROMPT = """
You are analyzing a client document to extract structured requirements.
Return a JSON object with these fields:
- business_goal: str
- key_stakeholders: list[str]
- data_sources: list[dict with name, format, volume]
- use_cases: list[str]
- constraints: list[str]
- open_questions: list[str]
Be thorough. Every ambiguity should appear in open_questions.
Return ONLY valid JSON, no markdown fences.
"""
OPENROUTER_API_KEY=sk-or-...
Gemini 3 Flash Preview pricing on OpenRouter: ~$0.0005/1k input tokens, ~$0.003/1k output. A 300-page PDF (≈150k tokens) costs approximately $0.075 to analyze. Always reasonable.
For documents that need Pro-level reasoning (very complex technical analysis):
use google/gemini-3-flash-preview with thinking_level: "high" before escalating to Pro.
testing
# Design Doc Creator ## When to Load This Skill Load when: design documents, requirements, new project start. Short fixture skill for testing (optional/meta skill).
development
# Windows Developer Guide ## When to Load Automatically loaded on Windows (`platform_trigger: "win32"`). Applies to: `.py`, `.ps1`, `.bat`, `.cmd` files and any Windows-specific workflow. ## Python on Windows ### Encoding (CRITICAL) Windows defaults to `cp1251` / `cp1252` for file I/O. Always specify UTF-8 explicitly: ```python with open("file.txt", "r", encoding="utf-8") as f: content = f.read() Path("file.txt").read_text(encoding="utf-8") Path("file.txt").write_text(content, encodin
development
# Test-First Patterns ## When to Load This Skill Load when writing tests, creating `.feature` files, setting up conftest, discussing test strategy, or reviewing coverage. ## Philosophy Tests are written BEFORE code. Always. No exceptions. The order is: Design Doc → BDD Scenarios → Unit Tests → Implementation. BDD scenarios come from the design document's use cases section — they are a direct translation of business requirements into executable specifications. This makes tests the living do
testing
# Skill: Supply Chain Auditor ## When to Load Auto-load when: adding dependencies, reviewing packages, updating versions, or discussing `requirements.txt`, `pyproject.toml`, `package.json`. Triggers on `dependency`, `install`, `package`, `CVE`, `audit`, `vulnerable` (≥2 keywords). ## Core Rules Every new dependency addition must pass this checklist before merging: 1. **Pinned** — exact version in production (`==1.2.3` for pip, `"1.2.3"` for npm, not `^` or `~`). 2. **Maintained** — last com