Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

zscole/model-hierarchy

Name: model-hierarchy
Author: zscole

/SKILL.md

npx skillsauth add zscole/model-hierarchy-skill model-hierarchy

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Error

VirusTotalMulti-engine malware detection

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Model Hierarchy

Route tasks to the cheapest model that can handle them. Most agent work is routine.

Core Principle

80% of agent tasks are janitorial. File reads, status checks, formatting, simple Q&A. These don't need expensive models. Reserve premium models for problems that actually require deep reasoning.

Model Tiers

Tier 1: Cheap ($0.10-0.50/M tokens)

| Model | Input | Output | Best For | |-------|-------|--------|----------| | DeepSeek V3 | $0.14 | $0.28 | General routine work | | GPT-4o-mini | $0.15 | $0.60 | Quick responses | | Claude Haiku | $0.25 | $1.25 | Fast tool use | | Gemini Flash | $0.075 | $0.30 | High volume | | GLM 5 (Zhipu) | (OpenRouter Z.AI) | (OpenRouter Z.AI) | Routine + moderate text; 200K context; text-only — do not use for image/vision | | Kimi K2.5 (Moonshot) | $0.45 | $2.25 | Routine + moderate; 262K context; multimodal (text + image + video) |

Text-only models (e.g. GLM 5): Do not use for any task that requires image input or vision — no photo analysis, screenshots, image-generation tools, or document/chart vision. Route to a vision-capable model (e.g. Kimi K2.5, GPT-4o, Gemini, Claude with vision, GLM-4.5V/4.6V).

Vision-capable Tier 1/2 (e.g. Kimi K2.5): Use for routine or moderate tasks that may involve images — screenshots, photo analysis, docs, image-generation orchestration — without moving to premium vision models.

Tier 2: Mid ($1-5/M tokens)

| Model | Input | Output | Best For | |-------|-------|--------|----------| | Claude Sonnet | $3.00 | $15.00 | Balanced performance | | GPT-4o | $2.50 | $10.00 | Multimodal tasks | | Gemini Pro | $1.25 | $5.00 | Long context |

Tier 3: Premium ($10-75/M tokens)

| Model | Input | Output | Best For | |-------|-------|--------|----------| | Claude Opus | $15.00 | $75.00 | Complex reasoning | | GPT-4.5 | $75.00 | $150.00 | Frontier tasks | | o1 | $15.00 | $60.00 | Multi-step reasoning | | o3-mini | $1.10 | $4.40 | Reasoning on budget |

Prices as of Feb 2026. Check provider docs for current rates.

Task Classification

Before executing any task, classify it:

ROUTINE → Use Tier 1

Requires image/vision → Do not assign to text-only models (GLM 5, etc.). Use a vision-capable model from Tier 1/2 or 3 (e.g. Kimi K2.5, GPT-4o, Gemini, Claude, GLM-4.5V).

Characteristics:

Single-step operations
Clear, unambiguous instructions
No judgment required
Deterministic output expected

Examples:

File read/write operations
Status checks and health monitoring
Simple lookups (time, weather, definitions)
Formatting and restructuring text
List operations (filter, sort, transform)
API calls with known parameters
Heartbeat and cron tasks
URL fetching and basic parsing

MODERATE → Use Tier 2

Characteristics:

Multi-step but well-defined
Some synthesis required
Standard patterns apply
Quality matters but isn't critical

Examples:

Code generation (standard patterns)
Summarization and synthesis
Draft writing (emails, docs, messages)
Data analysis and transformation
Multi-file operations
Tool orchestration
Code review (non-security)
Search and research tasks

COMPLEX → Use Tier 3

Characteristics:

Novel problem solving required
Multiple valid approaches
Nuanced judgment calls
High stakes or irreversible
Previous attempts failed

Examples:

Multi-step debugging
Architecture and design decisions
Security-sensitive code review
Tasks where cheaper model already failed
Ambiguous requirements needing interpretation
Long-context reasoning (>50K tokens)
Creative work requiring originality
Adversarial or edge-case handling

Decision Algorithm

function selectModel(task):
    # Rule 1: Vision override (Tier 1/2 includes text-only models)
    if task.requiresImageInput or task.requiresVision:
        return VISION_CAPABLE_MODEL  # e.g. Kimi K2.5, GPT-4o, Gemini, Claude; do not use GLM 5 or other text-only
    
    # Rule 2: Escalation override
    if task.previousAttemptFailed:
        return nextTierUp(task.previousModel)
    
    # Rule 3: Explicit complexity signals
    if task.hasSignal("debug", "architect", "design", "security"):
        return TIER_3
    
    if task.hasSignal("write", "code", "summarize", "analyze"):
        return TIER_2
    
    # Rule 4: Default classification
    complexity = classifyTask(task)
    
    if complexity == ROUTINE:
        return TIER_1
    elif complexity == MODERATE:
        return TIER_2
    else:
        return TIER_3

Behavioral Rules

For Main Session

Default to Tier 2 for interactive work
Suggest downgrade when doing routine work: "This is routine - I can handle this on a cheaper model or spawn a sub-agent."
Request upgrade when stuck: "This needs more reasoning power. Switching to [premium model]."

For Sub-Agents

Default to Tier 1 unless task is clearly moderate+
Batch similar tasks to amortize overhead
Report failures back to parent for escalation

For Automated Tasks

Heartbeats/monitoring → Always Tier 1
Scheduled reports → Tier 1 or 2 based on complexity
Alert responses → Start Tier 2, escalate if needed

Communication Patterns

When suggesting model changes, use clear language:

Downgrade suggestion:

"This looks like routine file work. Want me to spawn a sub-agent on DeepSeek for this? Same result, fraction of the cost."

Upgrade request:

"I'm hitting the limits of what I can figure out here. This needs Opus-level reasoning. Switching up."

Explaining hierarchy:

"I'm running the heavy analysis on Sonnet while sub-agents fetch the data on DeepSeek. Keeps costs down without sacrificing quality where it matters."

Cost Impact

Assuming 100K tokens/day average usage:

| Strategy | Monthly Cost | Notes | |----------|--------------|-------| | Pure Opus | ~$225 | Maximum capability, maximum spend | | Pure Sonnet | ~$45 | Good default for most work | | Pure DeepSeek | ~$8 | Cheap but limited on hard problems | | Hierarchy (80/15/5) | ~$19 | Best of all worlds |

The 80/15/5 split:

80% routine tasks on Tier 1 (~$6)
15% moderate tasks on Tier 2 (~$7)
5% complex tasks on Tier 3 (~$6)

Result: 10x cost reduction vs pure premium, with equivalent quality on complex tasks.

Integration Examples

OpenClaw

# config.yml - set default model
model: anthropic/claude-sonnet-4

# In session, switch models
/model opus  # upgrade for complex task
/model deepseek  # downgrade for routine

# Spawn sub-agent on cheap model
sessions_spawn:
  task: "Fetch and parse these 50 URLs"
  model: deepseek

OpenRouter (Tier 1 with vision or text-only):

# Tier 1 with vision — Kimi K2.5 (multimodal)
model: openrouter/moonshotai/kimi-k2.5
# Heartbeats, cron, image-involving tasks: K2.5 handles text and vision.

# Tier 1 text-only — GLM 5 (no vision)
# model: openrouter/z-ai/glm-5  # exact ID TBD on OpenRouter Z.AI
# Routine text-only only; for image tasks use Kimi K2.5 or another vision-capable model.

Claude Code

# In CLAUDE.md or project instructions
When spawning background agents, use claude-3-haiku for:
- File operations
- Simple searches  
- Status checks

Reserve claude-sonnet-4 for:
- Code generation
- Analysis tasks

General Agent Systems

def get_model_for_task(task_description: str) -> str:
    routine_signals = ['read', 'fetch', 'check', 'list', 'format', 'status']
    complex_signals = ['debug', 'architect', 'design', 'security', 'why']
    
    desc_lower = task_description.lower()
    
    if any(signal in desc_lower for signal in complex_signals):
        return "claude-opus-4"
    elif any(signal in desc_lower for signal in routine_signals):
        return "deepseek-v3"
    else:
        return "claude-sonnet-4"

Anti-Patterns

DON'T:

Run heartbeats on Opus
Use premium models for file I/O
Keep expensive model when task is clearly routine
Spawn sub-agents on premium models by default
Use GLM 5 (or any text-only Tier 1/2 model) for image/vision tasks — e.g. photo analysis, screenshot understanding, image-generation skills, or any tool that takes image input

DO:

Start mid-tier, adjust based on task
Spawn helpers on cheapest viable model
Escalate explicitly when stuck
Track cost per task type to optimize further

Extending This Skill

To customize for your use case:

Adjust tier definitions based on your provider/budget
Add domain-specific signals to classification rules
Track actual complexity vs predicted to improve heuristics
Set budget alerts to catch runaway premium usage

zscole/model-hierarchy

/SKILL.md

Cost-optimize AI agent operations by routing tasks to appropriate models based on complexity. Use this skill when: (1) deciding which model to use for a task, (2) spawning sub-agents, (3) considering cost efficiency, (4) the current model feels like overkill for the task. Triggers: "model routing", "cost optimization", "which model", "too expensive", "spawn agent".

333 stars

testing

Updated Mar 20, 2026

$ install --global

skillsauth

npx skillsauth add zscole/model-hierarchy-skill model-hierarchy

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Error

VirusTotalMulti-engine malware detection

70%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Mar 20, 2026, 5:40 PM230.4s8 files scanned

SKILL.md

name:: model-hierarchy
description:: >
Use this skill when:: (1) deciding which model to use for a task, (2) spawning sub-agents,
Triggers:: model routing", "cost optimization", "which model", "too expensive", "spawn agent".

Model Hierarchy

Route tasks to the cheapest model that can handle them. Most agent work is routine.

Core Principle

80% of agent tasks are janitorial. File reads, status checks, formatting, simple Q&A. These don't need expensive models. Reserve premium models for problems that actually require deep reasoning.

Model Tiers

Tier 1: Cheap ($0.10-0.50/M tokens)

Tier 2: Mid ($1-5/M tokens)

Tier 3: Premium ($10-75/M tokens)

Prices as of Feb 2026. Check provider docs for current rates.

Task Classification

Before executing any task, classify it:

ROUTINE → Use Tier 1

Requires image/vision → Do not assign to text-only models (GLM 5, etc.). Use a vision-capable model from Tier 1/2 or 3 (e.g. Kimi K2.5, GPT-4o, Gemini, Claude, GLM-4.5V).

Characteristics:

Single-step operations
Clear, unambiguous instructions
No judgment required
Deterministic output expected

Examples:

File read/write operations
Status checks and health monitoring
Simple lookups (time, weather, definitions)
Formatting and restructuring text
List operations (filter, sort, transform)
API calls with known parameters
Heartbeat and cron tasks
URL fetching and basic parsing

MODERATE → Use Tier 2

Characteristics:

Multi-step but well-defined
Some synthesis required
Standard patterns apply
Quality matters but isn't critical

Examples:

Code generation (standard patterns)
Summarization and synthesis
Draft writing (emails, docs, messages)
Data analysis and transformation
Multi-file operations
Tool orchestration
Code review (non-security)
Search and research tasks

COMPLEX → Use Tier 3

Characteristics:

Novel problem solving required
Multiple valid approaches
Nuanced judgment calls
High stakes or irreversible
Previous attempts failed

Examples:

Multi-step debugging
Architecture and design decisions
Security-sensitive code review
Tasks where cheaper model already failed
Ambiguous requirements needing interpretation
Long-context reasoning (>50K tokens)
Creative work requiring originality
Adversarial or edge-case handling

Decision Algorithm

function selectModel(task):
    # Rule 1: Vision override (Tier 1/2 includes text-only models)
    if task.requiresImageInput or task.requiresVision:
        return VISION_CAPABLE_MODEL  # e.g. Kimi K2.5, GPT-4o, Gemini, Claude; do not use GLM 5 or other text-only
    
    # Rule 2: Escalation override
    if task.previousAttemptFailed:
        return nextTierUp(task.previousModel)
    
    # Rule 3: Explicit complexity signals
    if task.hasSignal("debug", "architect", "design", "security"):
        return TIER_3
    
    if task.hasSignal("write", "code", "summarize", "analyze"):
        return TIER_2
    
    # Rule 4: Default classification
    complexity = classifyTask(task)
    
    if complexity == ROUTINE:
        return TIER_1
    elif complexity == MODERATE:
        return TIER_2
    else:
        return TIER_3

Behavioral Rules

For Main Session

Default to Tier 2 for interactive work
Suggest downgrade when doing routine work: "This is routine - I can handle this on a cheaper model or spawn a sub-agent."
Request upgrade when stuck: "This needs more reasoning power. Switching to [premium model]."

For Sub-Agents

Default to Tier 1 unless task is clearly moderate+
Batch similar tasks to amortize overhead
Report failures back to parent for escalation

For Automated Tasks

Heartbeats/monitoring → Always Tier 1
Scheduled reports → Tier 1 or 2 based on complexity
Alert responses → Start Tier 2, escalate if needed

Communication Patterns

When suggesting model changes, use clear language:

Downgrade suggestion:

"This looks like routine file work. Want me to spawn a sub-agent on DeepSeek for this? Same result, fraction of the cost."

Upgrade request:

"I'm hitting the limits of what I can figure out here. This needs Opus-level reasoning. Switching up."

Explaining hierarchy:

"I'm running the heavy analysis on Sonnet while sub-agents fetch the data on DeepSeek. Keeps costs down without sacrificing quality where it matters."

Cost Impact

Assuming 100K tokens/day average usage:

The 80/15/5 split:

80% routine tasks on Tier 1 (~$6)
15% moderate tasks on Tier 2 (~$7)
5% complex tasks on Tier 3 (~$6)

Result: 10x cost reduction vs pure premium, with equivalent quality on complex tasks.

Integration Examples

OpenClaw

# config.yml - set default model
model: anthropic/claude-sonnet-4

# In session, switch models
/model opus  # upgrade for complex task
/model deepseek  # downgrade for routine

# Spawn sub-agent on cheap model
sessions_spawn:
  task: "Fetch and parse these 50 URLs"
  model: deepseek

OpenRouter (Tier 1 with vision or text-only):

# Tier 1 with vision — Kimi K2.5 (multimodal)
model: openrouter/moonshotai/kimi-k2.5
# Heartbeats, cron, image-involving tasks: K2.5 handles text and vision.

# Tier 1 text-only — GLM 5 (no vision)
# model: openrouter/z-ai/glm-5  # exact ID TBD on OpenRouter Z.AI
# Routine text-only only; for image tasks use Kimi K2.5 or another vision-capable model.

Claude Code

# In CLAUDE.md or project instructions
When spawning background agents, use claude-3-haiku for:
- File operations
- Simple searches  
- Status checks

Reserve claude-sonnet-4 for:
- Code generation
- Analysis tasks

General Agent Systems

def get_model_for_task(task_description: str) -> str:
    routine_signals = ['read', 'fetch', 'check', 'list', 'format', 'status']
    complex_signals = ['debug', 'architect', 'design', 'security', 'why']
    
    desc_lower = task_description.lower()
    
    if any(signal in desc_lower for signal in complex_signals):
        return "claude-opus-4"
    elif any(signal in desc_lower for signal in routine_signals):
        return "deepseek-v3"
    else:
        return "claude-sonnet-4"

Anti-Patterns

DON'T:

Run heartbeats on Opus
Use premium models for file I/O
Keep expensive model when task is clearly routine
Spawn sub-agents on premium models by default
Use GLM 5 (or any text-only Tier 1/2 model) for image/vision tasks — e.g. photo analysis, screenshot understanding, image-generation skills, or any tool that takes image input

DO:

Start mid-tier, adjust based on task
Spawn helpers on cheapest viable model
Escalate explicitly when stuck
Track cost per task type to optimize further

Extending This Skill

To customize for your use case:

Adjust tier definitions based on your provider/budget
Add domain-specific signals to classification rules
Track actual complexity vs predicted to improve heuristics
Set budget alerts to catch runaway premium usage

Related Skills

steipete/skill-creator

testing

VerifiedTrustedCommunity

Create, edit, improve, or audit AgentSkills. Use when creating a new skill from scratch or when asked to improve, review, audit, tidy up, or clean up an existing skill or SKILL.md file. Also use when editing or restructuring a skill directory (moving files to references/ or scripts/, removing stale content, validating against the AgentSkills spec). Triggers on phrases like "create a skill", "author a skill", "tidy up a skill", "improve this skill", "review the skill", "clean up the skill", "audit the skill".

356,423SKILL.mdUpdated Apr 13, 2026

steipete/skill-creator

steipete/healthcheck

testing

VerifiedTrustedCommunity

Host security hardening and risk-tolerance configuration for OpenClaw deployments. Use when a user asks for security audits, firewall/SSH/update hardening, risk posture, exposure review, OpenClaw cron scheduling for periodic checks, or version status checks on a machine running OpenClaw (laptop, workstation, Pi, VPS).

356,423SKILL.mdUpdated Apr 13, 2026

openclaw/skill-creator

testing

VerifiedTrustedCommunity

353,662SKILL.mdUpdated Apr 10, 2026

openclaw/skill-creator

openclaw/healthcheck

testing

VerifiedTrustedCommunity

353,662SKILL.mdUpdated Apr 10, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/zscole/model-hierarchy-skill.git

# Copy into Claude Code skills folder (global)
cp -r model-hierarchy-skill/ ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

zscole/model-hierarchy-skill

333 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT