skills/external-model-delegation/SKILL.md
# External Model Delegation Pattern A `UserPromptSubmit` hook classifies every user prompt into one of six cost/performance tiers. The hook injects `additionalContext` instructing Claude to run a specific delegation script and return the output. ## Tier routing table | Tier | Delegation command | Cost | |------|-------------------|------| | QWEN | `qwen3 "prompt"` | $0 (local Ollama) | | DEEPSEEK_FLASH | `deepseek --flash "prompt"` | $0.14 / $0.28 per M tokens | | DEEPSEEK_PRO | `deepseek --p
npx skillsauth add alinaqi/claude-bootstrap skills/external-model-delegationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A UserPromptSubmit hook classifies every user prompt into one of six cost/performance tiers. The hook injects additionalContext instructing Claude to run a specific delegation script and return the output.
| Tier | Delegation command | Cost |
|------|-------------------|------|
| QWEN | qwen3 "prompt" | $0 (local Ollama) |
| DEEPSEEK_FLASH | deepseek --flash "prompt" | $0.14 / $0.28 per M tokens |
| DEEPSEEK_PRO | deepseek --pro "prompt" | $0.44 / $0.87 per M tokens |
| KIMI | kimi --quiet -p "prompt" | $0.60 / $2.50 per M tokens |
| CODEX | codex exec | varies |
| CLAUDE | handle natively | $3-5 / $15-25 per M tokens |
Each script is a self-contained executable in ~/bin/ that accepts a prompt and writes the response to stdout:
~/bin/
├── qwen3 # Shell: curl to local Ollama API
├── kimi # Shell: execs Kimi CLI binary
├── deepseek # Python: httpx to DeepSeek Anthropic-compat API
└── route-task # Shell + qwen3: classifies prompt into tier
qwen3 "what is 2+2"--flash / --pro model flags (deepseek)--quiet mode flag (kimi)#!/bin/bash
# Minimal delegator template
PROMPT="$1"
API_KEY="${EXTERNAL_API_KEY:-}"
# Call external API, write result to stdout
curl -s https://api.example.com/chat \
-H "Authorization: Bearer $API_KEY" \
-d "$(jq -n --arg p "$PROMPT" '{prompt: $p}')" \
| jq -r '.response'
User types prompt
↓
UserPromptSubmit hook fires
↓
qwen3 classifies into tier (QWEN|DEEPSEEK_FLASH|DEEPSEEK_PRO|KIMI|CODEX|CLAUDE)
↓
Hook injects additionalContext: "Run: <delegation-command>"
↓
Claude reads context, spawns delegation script, returns output
↓
User sees response from the delegated model
| Tier | Task types | |------|-----------| | QWEN | grep, find, regex, shell, syntax lookups, log reading, short summaries | | DEEPSEEK_FLASH | Simple code, boilerplate, CRUD, test writing, small fixes, config | | DEEPSEEK_PRO | Multi-file features, refactors, debugging, medium coding, docs | | KIMI | Single-file review, medium reasoning, commit messages, diff summaries | | CODEX | Bulk generation, mechanical changes across many files | | CLAUDE | Architecture, security, complex debugging, system design, quality-critical |
# Required env vars (set in ~/.zshrc)
export DEEPSEEK_API_KEY="sk-..." # For deepseek delegator
export OPENAI_API_KEY="sk-..." # For codex CLI
# Ollama must be running locally for qwen3 classification + delegation
testing
Multi-model validation council — auto-validate plans, architecture changes, and PRs via validate-plan/review before executing
development
Mandatory code reviews via /code-review before commits and deploys
development
# Visual Validation — Autonomous Screenshot Verification ## Philosophy Every UI change should be visually verified before it ships. Peekaboo captures pixel-accurate screenshots. The system compares before/after and flags visual regressions. No manual "looks good to me" — the machine verifies what the machine built. ## Autonomous Flow ``` static/* files modified (detected by auto-review-hook or E2E testkit) ↓ peekaboo image --mode screen → ~/.maggy/visual-verify/after-{ts}.png ↓ Compa
tools
# Model Routing System ## How Routing Decisions Are Made Every user prompt goes through a 9-tier classification pipeline before any AI model processes it. The system answers three questions: 1. **Which model should handle this?** — 9-tier cost/complexity classification 2. **Is the classifier itself working?** — Cascading fallback (qwen3 → kimi → deepseek → cache) 3. **Can we verify the result?** — Tool-level fallback + auto-evaluation ### The Pipeline ``` User types prompt ↓ UserPromptS