llm-architect-skill/SKILL.md
Use when user needs LLM system architecture, model deployment, optimization strategies, and production serving infrastructure. Designs scalable large language model applications with focus on performance, cost efficiency, and safety.
npx skillsauth add 404kidwiz/claude-supercode-skills llm-architectInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Provides expert large language model system architecture for designing, deploying, and optimizing LLM applications at scale. Specializes in model selection, RAG (Retrieval Augmented Generation) pipelines, fine-tuning strategies, serving infrastructure, cost optimization, and safety guardrails for production LLM systems.
Invoke this skill when:
Do NOT invoke when:
| Requirement | Recommended Approach | |-------------|---------------------| | Latency <100ms | Small fine-tuned model (7B quantized) | | Latency <2s, budget unlimited | Claude 3 Opus / GPT-4 | | Latency <2s, domain-specific | Claude 3 Sonnet fine-tuned | | Latency <2s, cost-sensitive | Claude 3 Haiku | | Batch/async acceptable | Batch API, cheapest tier |
Need to customize LLM behavior?
│
├─ Need domain-specific knowledge?
│ ├─ Knowledge changes frequently?
│ │ └─ RAG (Retrieval Augmented Generation)
│ └─ Knowledge is static?
│ └─ Fine-tuning OR RAG (test both)
│
├─ Need specific output format/style?
│ ├─ Can describe in prompt?
│ │ └─ Prompt engineering (try first)
│ └─ Format too complex for prompt?
│ └─ Fine-tuning
│
└─ Need latency <100ms?
└─ Fine-tuned small model (7B-13B)
[Client] → [API Gateway + Rate Limiting]
↓
[Request Router]
(Route by intent/complexity)
↓
┌────────┴────────┐
↓ ↓
[Fast Model] [Powerful Model]
(Haiku/Small) (Sonnet/Large)
↓ ↓
[Cache Layer] ← [Response Aggregator]
↓
[Logging & Monitoring]
↓
[Response to Client]
Ask these questions:
def select_model(requirements):
if requirements.latency_p95 < 100: # milliseconds
if requirements.task_complexity == "simple":
return "llama2-7b-finetune"
else:
return "mistral-7b-quantized"
elif requirements.latency_p95 < 2000:
if requirements.budget == "unlimited":
return "claude-3-opus"
elif requirements.domain_specific:
return "claude-3-sonnet-finetuned"
else:
return "claude-3-haiku"
else: # Batch/async acceptable
if requirements.accuracy_critical:
return "gpt-4-with-ensemble"
else:
return "batch-api-cheapest-tier"
# Run benchmark on eval dataset
python scripts/evaluate_model.py \
--model claude-3-sonnet \
--dataset data/eval_1000_examples.jsonl \
--metrics accuracy,latency,cost
# Expected output:
# Accuracy: 94.3%
# P95 Latency: 1,245ms
# Cost per 1K requests: $2.15
| Strategy | Savings | When to Use | |----------|---------|-------------| | Semantic caching | 40-80% | 60%+ similar queries | | Multi-model routing | 30-50% | Mixed complexity queries | | Prompt compression | 10-20% | Long context inputs | | Batching | 20-40% | Async-tolerant workloads | | Smaller model cascade | 40-60% | Simple queries first |
| Observation | Action | |-------------|--------| | Accuracy <80% after prompt iteration | Consider fine-tuning | | Latency 2x requirement | Review infrastructure | | Cost >2x budget | Aggressive caching/routing | | Hallucination rate >5% | Add RAG or stronger guardrails | | Safety bypass detected | Immediate security review |
| Metric | Target | Critical | |--------|--------|----------| | P95 Latency | <2x requirement | <3x requirement | | Accuracy | >90% | >80% | | Cache Hit Rate | >60% | >40% | | Error Rate | <1% | <5% | | Cost/1K requests | Within budget | <150% budget |
Detailed Technical Reference: See REFERENCE.md
Code Examples & Patterns: See EXAMPLES.md
development
Expert in automating Excel workflows using Node.js (ExcelJS, SheetJS) and Python (pandas, openpyxl).
content-media
Expert in designing durable, scalable workflow systems using Temporal, Camunda, and Event-Driven Architectures.
tools
Use when user needs WordPress development, theme or plugin creation, site optimization, security hardening, multisite management, or scaling WordPress from small sites to enterprise platforms.
tools
Expert in Windows Server, Active Directory (AD DS), Hybrid Identity (Entra ID), and PowerShell automation.