skills/model-routing/SKILL.md
# Model Routing System ## How Routing Decisions Are Made Every user prompt goes through a 9-tier classification pipeline before any AI model processes it. The system answers three questions: 1. **Which model should handle this?** — 9-tier cost/complexity classification 2. **Is the classifier itself working?** — Cascading fallback (qwen3 → kimi → deepseek → cache) 3. **Can we verify the result?** — Tool-level fallback + auto-evaluation ### The Pipeline ``` User types prompt ↓ UserPromptS
npx skillsauth add alinaqi/claude-bootstrap skills/model-routingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Every user prompt goes through a 9-tier classification pipeline before any AI model processes it. The system answers three questions:
User types prompt
↓
UserPromptSubmit hook fires (~/.claude/hooks/route-task-hook)
↓
Classifier: qwen3 (local, free) classifies into tier
↓ (fails?)
Classifier: kimi (local, free) retries
↓ (fails?)
Classifier: deepseek-flash (~$0.0001) retries
↓ (fails?)
Classifier: cached tier from last success
↓
Hook injects routing decision into Claude's context
↓
Claude delegates to the right model or handles directly
| Tier | Model | Input (per M) | Output (per M) | Handles | |------|-------|---------------|----------------|---------| | 0 | Qwen3 (local) | $0 | $0 | grep, find, shell, syntax, log reading | | 1 | Gemini 2.5 Flash-Lite | $0.10 | $0.40 | Bulk extraction, classification, CIG pipelines | | 2 | DeepSeek V4 Flash | $0.14 | $0.28 | Simple code, CRUD, test writing, small fixes | | 3 | DeepSeek V4 Pro | $0.44 | $0.87 | Multi-file features, refactors, debugging (~80% of work) | | 4 | Gemini 2.5 Flash | $0.15 | $0.60 | Multimodal (images, video, audio), brand analysis | | 5 | Kimi K2.6 | $0.60 | $2.50 | Code review, commit messages, diff summaries | | 6 | Gemini 3.1 Pro + Search | $1.25 | $10.00 | Deep research, Google grounding, 2M context | | 7 | Codex | varies | varies | Bulk generation, code review | | 8 | Claude Sonnet/Opus | $3-5 | $15-25 | Architecture, security, quality-critical |
When the hook says "delegate to X", run the matching command and return its output:
# Tier 0 — Qwen3
~/bin/qwen3 "prompt"
# Tier 1 — Gemini Flash-Lite
~/bin/gemini --flash-lite "prompt"
# Tier 2 — DeepSeek Flash
~/bin/deepseek --flash "prompt"
# Tier 3 — DeepSeek Pro
~/bin/deepseek --pro "prompt"
# Tier 4 — Gemini Flash
~/bin/gemini --flash "prompt"
# Tier 5 — Kimi
~/bin/kimi --quiet -p "prompt"
# Tier 6 — Gemini Pro Search
~/bin/gemini --pro-search "prompt"
# Tier 7 — Codex
codex exec "prompt"
# Tier 8 — Claude
# Handle directly (no delegation)
Every ~/bin/ script follows the same pattern:
script "what is 2+2"--flash, --pro, --flash-lite, --pro-search--quiet (where applicable)~/bin/
├── qwen3 # Shell: curl to local Ollama API
├── kimi # Shell: execs Kimi CLI binary
├── deepseek # Python: httpx to DeepSeek Anthropic-compat API
├── gemini # Python: httpx to Gemini OpenAI-compat API
├── research # Python: multi-backend research with auto-evaluation
└── route-task # Shell: qwen3-powered task classification
The classifier itself can fail. When it does, cascading fallback kicks in:
| Level | Classifier | Cost | Threshold |
|-------|-----------|------|-----------|
| 1 | qwen3 (Ollama) | $0 | 2s connect, 8s classify |
| 2 | kimi CLI | $0 | Local process |
| 3 | deepseek-flash | ~$0.0001 | API call |
| 4 | Cached tier | $0 | From ~/.claude/routing-cache.json |
The cache (~/.claude/routing-cache.json) saves the last successful tier and timestamp. After compaction, when Ollama may be briefly unreachable, the cache ensures routing continues without dropping to CLAUDE by default.
When Claude's built-in tools fail, external backends take over:
| Failed Tool | Fallback 1 | Fallback 2 |
|-------------|------------|------------|
| WebSearch / WebFetch | ~/bin/research "query" | ~/bin/deepseek --pro "query" |
| Read / file access | cat via Bash | — |
| Grep | grep -r via Bash | — |
~/bin/research)Multi-backend research with auto-evaluation:
~/bin/research --eval~/.claude/research-eval.jsonlMaggy's model_router.py mirrors the same 9-tier structure in DEFAULT_TIERS. The PiAdapter uses the same delegation scripts for execution. Task type overrides in routing_rules_defaults.py ensure:
research, competitor → Gemini Pro Search (Google grounding)bulk → Gemini Flash-Lite (cheapest)security, architecture, planning → Claude (quality-critical)docs, tests → DeepSeek Pro (cost-efficient)review → Claude (security + architecture depth)# Required for delegation scripts (in ~/.zshrc)
export DEEPSEEK_API_KEY="sk-..."
export GEMINI_API_KEY="..." # For gemini delegator
export OPENAI_API_KEY="sk-..." # For codex CLI
# Ollama must be running locally for qwen3
ollama serve # or launch at startup
~/.claude/routing-log.jsonl — every classification with tier, classifier used, tokens saved~/.claude/routing-cache.json — last tier for post-compact recovery~/.claude/research-eval.jsonl — per-query backend scoringtesting
Multi-model validation council — auto-validate plans, architecture changes, and PRs via validate-plan/review before executing
development
Task-scoped memory lifecycle — typed MnemoGraph prevents lossy context compaction by treating facts/decisions/code-refs/handoffs as distinct node types with per-type eviction policies
development
Mandatory code reviews via /code-review before commits and deploys
development
# Visual Validation — Autonomous Screenshot Verification ## Philosophy Every UI change should be visually verified before it ships. Peekaboo captures pixel-accurate screenshots. The system compares before/after and flags visual regressions. No manual "looks good to me" — the machine verifies what the machine built. ## Autonomous Flow ``` static/* files modified (detected by auto-review-hook or E2E testkit) ↓ peekaboo image --mode screen → ~/.maggy/visual-verify/after-{ts}.png ↓ Compa