plugins/consider/skills/pick-model/SKILL.md
Select the right AI model for a task based on external benchmarks, pricing, and effort tradeoffs. Use when the user asks "which model should I use?", "what's the best model for X?", "help me pick a model", "model comparison", "should I use Opus or Sonnet?", or needs to decide between AI models for a specific workload. Also use when someone mentions cost optimization for LLM usage or wants to know if a cheaper model would work.
npx skillsauth add bcbeidel/wos pick-modelInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Assess stakes. Ask: "What happens if the model gets this wrong?" Map to one of three levels:
Check cost sensitivity. Ask if there's a budget constraint:
Apply the crossover check. Before recommending a cheaper model, check: does it score >70% of the frontier on the relevant benchmark? If yes, recommend it with high effort. If <50%, skip to frontier. Between 50-70%, note the tradeoff explicitly.
Set the effort parameter. Based on stakes and provider, recommend
the specific effort setting. Read the effort controls table in
references/model-landscape.md.
Present the recommendation. Use the output format below. Include the primary pick, an alternative at a different price point, the effort setting, and a one-line rationale citing the benchmark evidence.
</process><output_format>
[Model Name] ([Provider]) — [one-line rationale with benchmark cite]
[parameter_name] = [value][Model Name] — [why this is the tradeoff option]
[2-3 sentences explaining the key differentiator — benchmark evidence, cost math, or capability gap that drives the recommendation]
Claude Sonnet 4.6 (Anthropic) — SWE-bench Verified 79.6%, Chatbot Arena Coding ELO 1523. Within 1.2 points of Opus on coding benchmarks at 60% of the cost.
effort = highDeepSeek V3.2 — 73% SWE-bench at $0.26/M input (17x cheaper). For 200 PRs/day, the cost difference is significant. Quality gap is real but may be acceptable for initial triage before human review.
At 200 PRs/day, the Sonnet-to-Opus upgrade buys 1.2 SWE-bench points for a 67% price increase — not justified when developer trust depends on consistency, not peak capability. Sonnet's coding ELO (1523) is third overall, behind only Opus variants. The DeepSeek alternative is viable for a two-tier setup: DeepSeek for initial scan, Sonnet for flagged PRs.
references/model-landscape.md.Chainable to: opportunity-cost (to analyze what switching models gives up), pareto (to find the 20% of workloads driving 80% of model costs)
tools
Use when the user wants to "audit a help skill", "review my plugin index", or "verify my help-skill is up to date". Audits a plugins/<plugin>/skills/help/SKILL.md against the help-skill rubric — coverage, freshness, frontmatter fidelity, plus five judgment dimensions and a trigger-collision check.
tools
Use when the user wants to "scaffold a help skill", "add a /<plugin>:help command", or "build a plugin index skill", or wants to give a plugin an orientation surface that lists its skills and common workflows. Produces a SKILL.md at plugins/<plugin>/skills/help/SKILL.md.
tools
Audits pair-level integrity of a primitive-pair (the artifact `/build:build-skill-pair` produces) by walking the four required artifact slots — principles doc, `build-<primitive>/SKILL.md`, `check-<primitive>/SKILL.md`, and the `primitive-routing.md` registration — and reports cross-artifact issues a per-SKILL.md checker cannot see: missing principles doc, divergent principles paths between halves, absent routing registration, missing build→check handoff. Per-half structural compliance with the unified pattern (`check-skill-pattern.md`) is delegated to `plugins/build/_shared/scripts/check_skill_pattern.py`. Use when the user wants to "audit a skill pair", "review a primitive pair", or "validate the skill pair for X". Not for auditing a single SKILL.md — route to `/build:check-skill`. Not for re-distilling a stale principles doc — route to `/build:build-skill-pair`.
testing
Audit a root-level resolver — verify AGENTS.md pointer, managed-region integrity, filing-table coverage against disk, context-table actionability, and trigger-eval pass rate. Use when the user wants to "audit a resolver", "validate routing table", or "find dark capabilities".