ai-cost-modeling/SKILL.md
Token economics for AI-powered features — estimate raw token cost per user and per tenant, compare providers, model retail pricing, and calculate margin. Invoke before committing any AI feature and when designing the AI module pricing tier.
npx skillsauth add peterbamuhigire/skills-web-dev ai-cost-modelingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
ai-cost-modeling or would be better handled by a more specific companion skill.SKILL.md first, then load only the referenced deep-dive files that are necessary for the task.Before implementing any AI feature, calculate the full token cost so you can:
All prices in USD per 1 million tokens.
| Provider & Model | Input ($/1M) | Output ($/1M) | Context Window | Best For | |-----------------|-------------|--------------|----------------|---------| | Claude Haiku 4.5 | $0.80 | $4.00 | 200K | High-volume, fast tasks | | Claude Sonnet 4.6 | $3.00 | $15.00 | 200K | Complex reasoning, long docs | | GPT-4o mini | $0.15 | $0.60 | 128K | Cost-critical, simple tasks | | GPT-4o | $2.50 | $10.00 | 128K | Balanced quality + cost | | DeepSeek V3 | $0.27 | $1.10 | 64K | Ultra-low cost, good quality | | Gemini 2.0 Flash | $0.10 | $0.40 | 1M | Cheapest option, large context | | Gemini 1.5 Pro | $1.25 | $5.00 | 2M | Very large document analysis |
Verify current pricing at provider dashboards before quoting clients. Prices change frequently.
Typical token ranges per single AI call:
| Feature Pattern | Input Tokens | Output Tokens | Notes | |----------------|-------------|--------------|-------| | Summarisation (1 record) | 300–800 | 100–300 | Scale with record length | | Summarisation (batch 10) | 2,000–5,000 | 500–1,500 | | | Classification (single) | 200–500 | 10–50 | Few-shot adds ~300 | | Anomaly detection (daily log) | 1,000–3,000 | 100–300 | | | Predictive alert | 500–2,000 | 100–400 | Depends on history injected | | Natural language report | 1,000–4,000 | 500–2,000 | | | Decision support | 800–3,000 | 200–600 | | | Conversational assistant (turn) | 400–1,500 | 150–500 | Grows with history | | Document extraction (1 page) | 800–2,000 | 200–600 | Image = higher | | Semantic search query | 100–300 | 200–800 | Embedding = separate |
tokens_per_call = input_tokens + output_tokens
calls_per_user_day = [estimated from use case]
tokens_per_user_day = tokens_per_call × calls_per_user_day
tokens_per_user_month = tokens_per_user_day × 30
input_cost_per_user = (input_tokens × calls/day × 30) / 1,000,000 × input_price
output_cost_per_user = (output_tokens × calls/day × 30) / 1,000,000 × output_price
total_cost_per_user = input_cost_per_user + output_cost_per_user
Input: 1,500 tokens (day's transactions injected)
Output: 400 tokens (narrative summary)
Calls: 1 per user per day
Monthly input = 1,500 × 1 × 30 / 1,000,000 × $0.80 = $0.036
Monthly output = 400 × 1 × 30 / 1,000,000 × $4.00 = $0.048
Total/user/month = $0.08
Input: 3,000 tokens (student history)
Output: 500 tokens (risk report)
Calls: 4 per user per month (weekly)
Monthly input = 3,000 × 4 / 1,000,000 × $3.00 = $0.036
Monthly output = 500 × 4 / 1,000,000 × $15.00 = $0.030
Total/user/month = $0.07
Input: 800 tokens per turn (context + query)
Output: 250 tokens per turn
Turns: 10 per user per day
Monthly input = 800 × 10 × 30 / 1,000,000 × $0.80 = $0.19
Monthly output = 250 × 10 × 30 / 1,000,000 × $4.00 = $0.30
Total/user/month = $0.49
tenant_monthly_cost = Σ (cost_per_user × active_users_in_tenant)
Example — 50-user tenant with mixed AI features:
40 users × $0.08 (POS summary) = $3.20
10 users × $0.49 (chat assistant) = $4.90
Tenant total = $8.10/month raw token cost
Track this via the token ledger (see ai-metering-billing).
Apply a 5–10× markup over raw token cost. This covers:
| Tier | Included Features | Token Budget/Tenant/Month | Suggested Price (UGX) | |------|------------------|--------------------------|----------------------| | Starter AI | Summarisation + Classification | 2M tokens | 50,000–80,000 | | Growth AI | + Alerts + Reports + Search | 10M tokens | 150,000–250,000 | | Enterprise AI | All features + Chat + Doc Intel | 50M tokens | 500,000–800,000 |
Adjust based on number of users per tenant. For large tenants (> 50 users), price per user.
If the client prefers per-user pricing:
| Tier | Price/User/Month (UGX) | Minimum Users | |------|----------------------|---------------| | AI Starter | 8,000–15,000 | 5 | | AI Growth | 20,000–35,000 | 5 | | AI Enterprise | 50,000–80,000 | 10 |
Apply these to reduce raw token cost before pricing:
WHERE and LIMIT before injecting.max_tokens — Cap output tokens on every call. A $0.05 call becomes $2.00 if output runs uncapped.hard_cap_usd = tier_price_usd × 0.60 ← your cost ceiling (preserves 40% margin)
soft_warning = hard_cap_usd × 0.80 ← alert tenant admin at 80%
overage_rate = $0.05 per 1,000 tokens ← optional metered overage above hard cap
Store hard_cap_usd in the tenant AI module configuration. Enforce via Budget Guard middleware (see ai-architecture-patterns).
## AI Cost Model — [Project] — [Feature Name] — [Date]
Model selected: [name + reason]
Input tokens/call: [n]
Output tokens/call: [n]
Calls/user/day: [n]
Raw cost/user/month: $[n]
Tenant size (users): [n]
Raw cost/tenant/month: $[n]
Recommended tier: [Starter / Growth / Enterprise]
Suggested price (UGX): [n]
Gross margin: [%]
Caching savings est.: [%]
Budget hard cap (USD): $[n]
See also:
ai-feature-spec — Token estimates per featureai-metering-billing — Token ledger and budget enforcementai-architecture-patterns — Budget Guard middleware implementationai-opportunity-canvas — Feature prioritisation by cost tierdata-ai
Use when adding AI-powered analytics to a SaaS platform — semantic search over business data, natural language queries, trend detection, anomaly alerts, and AI-generated insights for dashboards. Covers embeddings, NL2SQL, and per-tenant analytics...
data-ai
Design AI-powered analytics dashboards — what metrics to show, how to display AI predictions and confidence, drill-down patterns, KPI cards, trend visualisation, AI Insights panels, export design, and role-based dashboard variants. Invoke when...
development
Use when designing, building, reviewing, or upgrading production software systems that must be secure, performant, maintainable, scalable, and user-centered. Apply before writing specs, code, architecture, APIs, databases, mobile apps, SaaS platforms, or ERP systems.
development
Professional web app UI using commercial templates (Tabler/Bootstrap 5) with strong frontend design direction when needed. Use for CRUD interfaces, dashboards, admin panels with SweetAlert2, DataTables, Flatpickr. Clone seeder-page.php, use...