Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

404kidwiz/llm-architect

Name: llm-architect
Author: 404kidwiz

llm-architect-skill/SKILL.md

npx skillsauth add 404kidwiz/claude-supercode-skills llm-architect

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Error

VirusTotalMulti-engine malware detection

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

LLM Architect

Purpose

Provides expert large language model system architecture for designing, deploying, and optimizing LLM applications at scale. Specializes in model selection, RAG (Retrieval Augmented Generation) pipelines, fine-tuning strategies, serving infrastructure, cost optimization, and safety guardrails for production LLM systems.

When to Use

Designing end-to-end LLM systems from requirements to production
Selecting models and serving infrastructure for specific use cases
Implementing RAG (Retrieval Augmented Generation) pipelines
Optimizing LLM costs while maintaining quality thresholds
Building safety guardrails and compliance mechanisms
Planning fine-tuning vs RAG vs prompt engineering strategies
Scaling LLM inference for high-throughput applications

Quick Start

Invoke this skill when:

Designing end-to-end LLM systems from requirements to production
Selecting models and serving infrastructure for specific use cases
Implementing RAG (Retrieval Augmented Generation) pipelines
Optimizing LLM costs while maintaining quality thresholds
Building safety guardrails and compliance mechanisms

Do NOT invoke when:

Simple API integration exists (use backend-developer instead)
Only prompt engineering needed without architecture decisions
Training foundation models from scratch (almost always wrong approach)
Generic ML tasks unrelated to language models (use ml-engineer)

Decision Framework

Model Selection Quick Guide

| Requirement | Recommended Approach | |-------------|---------------------| | Latency <100ms | Small fine-tuned model (7B quantized) | | Latency <2s, budget unlimited | Claude 3 Opus / GPT-4 | | Latency <2s, domain-specific | Claude 3 Sonnet fine-tuned | | Latency <2s, cost-sensitive | Claude 3 Haiku | | Batch/async acceptable | Batch API, cheapest tier |

RAG vs Fine-Tuning Decision Tree

Need to customize LLM behavior?
│
├─ Need domain-specific knowledge?
│  ├─ Knowledge changes frequently?
│  │  └─ RAG (Retrieval Augmented Generation)
│  └─ Knowledge is static?
│     └─ Fine-tuning OR RAG (test both)
│
├─ Need specific output format/style?
│  ├─ Can describe in prompt?
│  │  └─ Prompt engineering (try first)
│  └─ Format too complex for prompt?
│     └─ Fine-tuning
│
└─ Need latency <100ms?
   └─ Fine-tuned small model (7B-13B)

Architecture Pattern

[Client] → [API Gateway + Rate Limiting]
              ↓
         [Request Router]
          (Route by intent/complexity)
              ↓
    ┌────────┴────────┐
    ↓                 ↓
[Fast Model]    [Powerful Model]
(Haiku/Small)   (Sonnet/Large)
    ↓                 ↓
[Cache Layer] ← [Response Aggregator]
    ↓
[Logging & Monitoring]
    ↓
[Response to Client]

Core Workflow: Design LLM System

1. Requirements Gathering

Ask these questions:

Latency: What's the P95 response time requirement?
Scale: Expected requests/day and growth trajectory?
Accuracy: What's the minimum acceptable quality? (measurable metric)
Cost: Budget constraints? ($/request or $/month)
Data: Existing datasets for evaluation? Sensitivity level?
Compliance: Regulatory requirements? (HIPAA, GDPR, SOC2, etc.)

2. Model Selection

def select_model(requirements):
    if requirements.latency_p95 < 100:  # milliseconds
        if requirements.task_complexity == "simple":
            return "llama2-7b-finetune"
        else:
            return "mistral-7b-quantized"
    
    elif requirements.latency_p95 < 2000:
        if requirements.budget == "unlimited":
            return "claude-3-opus"
        elif requirements.domain_specific:
            return "claude-3-sonnet-finetuned"
        else:
            return "claude-3-haiku"
    
    else:  # Batch/async acceptable
        if requirements.accuracy_critical:
            return "gpt-4-with-ensemble"
        else:
            return "batch-api-cheapest-tier"

3. Prototype & Evaluate

# Run benchmark on eval dataset
python scripts/evaluate_model.py \
  --model claude-3-sonnet \
  --dataset data/eval_1000_examples.jsonl \
  --metrics accuracy,latency,cost

# Expected output:
# Accuracy: 94.3%
# P95 Latency: 1,245ms
# Cost per 1K requests: $2.15

4. Iteration Checklist

[ ] Latency P95 meets requirement? If no → optimize serving (quantization, caching)
[ ] Accuracy meets threshold? If no → improve prompts, fine-tune, or upgrade model
[ ] Cost within budget? If no → aggressive caching, smaller model routing, batching
[ ] Safety guardrails tested? If no → add content filters, PII detection
[ ] Monitoring dashboards live? If no → set up Prometheus + Grafana
[ ] Runbook documented? If no → document common failures and fixes

Cost Optimization Strategies

| Strategy | Savings | When to Use | |----------|---------|-------------| | Semantic caching | 40-80% | 60%+ similar queries | | Multi-model routing | 30-50% | Mixed complexity queries | | Prompt compression | 10-20% | Long context inputs | | Batching | 20-40% | Async-tolerant workloads | | Smaller model cascade | 40-60% | Simple queries first |

Safety Checklist

[ ] Content filtering tested against adversarial examples
[ ] PII detection and redaction validated
[ ] Prompt injection defenses in place
[ ] Output validation rules implemented
[ ] Audit logging configured for all requests
[ ] Compliance requirements documented and validated

Red Flags - When to Escalate

| Observation | Action | |-------------|--------| | Accuracy <80% after prompt iteration | Consider fine-tuning | | Latency 2x requirement | Review infrastructure | | Cost >2x budget | Aggressive caching/routing | | Hallucination rate >5% | Add RAG or stronger guardrails | | Safety bypass detected | Immediate security review |

Quick Reference: Performance Targets

| Metric | Target | Critical | |--------|--------|----------| | P95 Latency | <2x requirement | <3x requirement | | Accuracy | >90% | >80% | | Cache Hit Rate | >60% | >40% | | Error Rate | <1% | <5% | | Cost/1K requests | Within budget | <150% budget |

Additional Resources

Detailed Technical Reference: See REFERENCE.md
- RAG implementation workflow
- Semantic caching patterns
- Deployment configurations
Code Examples & Patterns: See EXAMPLES.md
- Anti-patterns (fine-tuning when prompting suffices, no fallback)
- Quality checklist for LLM systems
- Resilient LLM call patterns

404kidwiz/llm-architect

llm-architect-skill/SKILL.md

Use when user needs LLM system architecture, model deployment, optimization strategies, and production serving infrastructure. Designs scalable large language model applications with focus on performance, cost efficiency, and safety.

63 stars

testing

Updated Mar 25, 2026

$ install --global

skillsauth

npx skillsauth add 404kidwiz/claude-supercode-skills llm-architect

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Error

VirusTotalMulti-engine malware detection

70%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Mar 25, 2026, 5:39 PM44.3s12 files scanned

SKILL.md

name:: llm-architect
description:: Use when user needs LLM system architecture, model deployment, optimization strategies, and production serving infrastructure. Designs scalable large language model applications with focus on performance, cost efficiency, and safety.

LLM Architect

Purpose

When to Use

Designing end-to-end LLM systems from requirements to production
Selecting models and serving infrastructure for specific use cases
Implementing RAG (Retrieval Augmented Generation) pipelines
Optimizing LLM costs while maintaining quality thresholds
Building safety guardrails and compliance mechanisms
Planning fine-tuning vs RAG vs prompt engineering strategies
Scaling LLM inference for high-throughput applications

Quick Start

Invoke this skill when:

Designing end-to-end LLM systems from requirements to production
Selecting models and serving infrastructure for specific use cases
Implementing RAG (Retrieval Augmented Generation) pipelines
Optimizing LLM costs while maintaining quality thresholds
Building safety guardrails and compliance mechanisms

Do NOT invoke when:

Simple API integration exists (use backend-developer instead)
Only prompt engineering needed without architecture decisions
Training foundation models from scratch (almost always wrong approach)
Generic ML tasks unrelated to language models (use ml-engineer)

Decision Framework

Model Selection Quick Guide

RAG vs Fine-Tuning Decision Tree

Need to customize LLM behavior?
│
├─ Need domain-specific knowledge?
│  ├─ Knowledge changes frequently?
│  │  └─ RAG (Retrieval Augmented Generation)
│  └─ Knowledge is static?
│     └─ Fine-tuning OR RAG (test both)
│
├─ Need specific output format/style?
│  ├─ Can describe in prompt?
│  │  └─ Prompt engineering (try first)
│  └─ Format too complex for prompt?
│     └─ Fine-tuning
│
└─ Need latency <100ms?
   └─ Fine-tuned small model (7B-13B)

Architecture Pattern

[Client] → [API Gateway + Rate Limiting]
              ↓
         [Request Router]
          (Route by intent/complexity)
              ↓
    ┌────────┴────────┐
    ↓                 ↓
[Fast Model]    [Powerful Model]
(Haiku/Small)   (Sonnet/Large)
    ↓                 ↓
[Cache Layer] ← [Response Aggregator]
    ↓
[Logging & Monitoring]
    ↓
[Response to Client]

Core Workflow: Design LLM System

1. Requirements Gathering

Ask these questions:

Latency: What's the P95 response time requirement?
Scale: Expected requests/day and growth trajectory?
Accuracy: What's the minimum acceptable quality? (measurable metric)
Cost: Budget constraints? ($/request or $/month)
Data: Existing datasets for evaluation? Sensitivity level?
Compliance: Regulatory requirements? (HIPAA, GDPR, SOC2, etc.)

2. Model Selection

def select_model(requirements):
    if requirements.latency_p95 < 100:  # milliseconds
        if requirements.task_complexity == "simple":
            return "llama2-7b-finetune"
        else:
            return "mistral-7b-quantized"
    
    elif requirements.latency_p95 < 2000:
        if requirements.budget == "unlimited":
            return "claude-3-opus"
        elif requirements.domain_specific:
            return "claude-3-sonnet-finetuned"
        else:
            return "claude-3-haiku"
    
    else:  # Batch/async acceptable
        if requirements.accuracy_critical:
            return "gpt-4-with-ensemble"
        else:
            return "batch-api-cheapest-tier"

3. Prototype & Evaluate

# Run benchmark on eval dataset
python scripts/evaluate_model.py \
  --model claude-3-sonnet \
  --dataset data/eval_1000_examples.jsonl \
  --metrics accuracy,latency,cost

# Expected output:
# Accuracy: 94.3%
# P95 Latency: 1,245ms
# Cost per 1K requests: $2.15

4. Iteration Checklist

[ ] Latency P95 meets requirement? If no → optimize serving (quantization, caching)
[ ] Accuracy meets threshold? If no → improve prompts, fine-tune, or upgrade model
[ ] Cost within budget? If no → aggressive caching, smaller model routing, batching
[ ] Safety guardrails tested? If no → add content filters, PII detection
[ ] Monitoring dashboards live? If no → set up Prometheus + Grafana
[ ] Runbook documented? If no → document common failures and fixes

Cost Optimization Strategies

Safety Checklist

[ ] Content filtering tested against adversarial examples
[ ] PII detection and redaction validated
[ ] Prompt injection defenses in place
[ ] Output validation rules implemented
[ ] Audit logging configured for all requests
[ ] Compliance requirements documented and validated

Red Flags - When to Escalate

Quick Reference: Performance Targets

Additional Resources

Detailed Technical Reference: See REFERENCE.md
- RAG implementation workflow
- Semantic caching patterns
- Deployment configurations
Code Examples & Patterns: See EXAMPLES.md
- Anti-patterns (fine-tuning when prompting suffices, no fallback)
- Quality checklist for LLM systems
- Resilient LLM call patterns

Related Skills

404kidwiz/xlsx

development

VerifiedTrustedCommunity

Expert in automating Excel workflows using Node.js (ExcelJS, SheetJS) and Python (pandas, openpyxl).

63SKILL.mdUpdated Mar 25, 2026

404kidwiz/workflow-orchestrator

content-media

VerifiedTrustedCommunity

Expert in designing durable, scalable workflow systems using Temporal, Camunda, and Event-Driven Architectures.

63SKILL.mdUpdated Mar 25, 2026

404kidwiz/workflow-orchestrator

404kidwiz/wordpress-master

tools

VerifiedTrustedCommunity

Use when user needs WordPress development, theme or plugin creation, site optimization, security hardening, multisite management, or scaling WordPress from small sites to enterprise platforms.

63SKILL.mdUpdated Mar 25, 2026

404kidwiz/wordpress-master

404kidwiz/windows-infra-admin

tools

VerifiedTrustedCommunity

Expert in Windows Server, Active Directory (AD DS), Hybrid Identity (Entra ID), and PowerShell automation.

63SKILL.mdUpdated Mar 25, 2026

404kidwiz/windows-infra-admin

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/404kidwiz/claude-supercode-skills.git

# Copy into Claude Code skills folder (global)
cp -r claude-supercode-skills/llm-architect-skill ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

404kidwiz/claude-supercode-skills

63 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT