skills/scale/SKILL.md
Scalability engineering. Horizontal/vertical decisions, auto-scaling, read replicas, connection pooling, rate limiting, capacity planning.
npx skillsauth add arbazkhan971/godmode scaleInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
/godmode:scale, "scaling", "auto-scale", "read replica"SCALABILITY CONTEXT:
Architecture: Monolith | Microservices | Serverless
Current Scale: <RPS, concurrent users, data volume>
Target Scale: <target RPS, users, volume>
Bottleneck: CPU | Memory | I/O | Network | DB | API
SLA: <latency p50/p99, availability>
Database: <type, size, read/write ratio>
Peak vs Baseline: <ratio>
IF single-threaded workload: scale UP
IF DB write bottleneck AND not at ceiling: scale UP
IF at instance ceiling: scale OUT
IF stateless app tier AND read-heavy: scale OUT
IF peak-to-baseline ratio > 5x: scale OUT
IF need fault tolerance: scale OUT
WHEN quick fix needed AND below ceiling: scale UP first
# K8s HPA — target CPU 70%, memory 80%
kubectl autoscale deployment myapp \
--cpu-percent=70 --min=2 --max=20
# Check current HPA status
kubectl get hpa myapp -o wide
POLICIES:
Scale out: CPU >70% for 3min -> +2 instances
Scale fast: CPU >90% for 1min -> +4 instances
Scale in: CPU <30% for 10min -> -1 instance
Stabilization: scale-up 60s, scale-down 300s
KEDA for event-driven (queue depth, Prometheus)
Primary (Writer) -> Replica 1, 2, 3 (Read)
Writes -> primary | Strong reads -> primary
Eventually-consistent reads -> replica (round-robin)
Read-after-write -> primary for 5s, then replica
IF replication lag > 500ms: route reads to primary
IF lag > 2s: circuit breaker, alert, investigate
Formula: pool_size = (core_count * 2) + 1
Start small (10-20), monitor wait time
Problem: 100 instances * 20 pool = 2000 connections
Solution: PgBouncer multiplexes 2000 app -> 100 DB
pool_mode = transaction
default_pool_size = 50
max_client_conn = 2000
max_db_connections = 100
RATE LIMITING:
API Gateway: Token bucket, 1000 req/min per key
Per-user: Sliding window, 100 req/min
Per-endpoint: Fixed window, 50 req/min
Global: Leaky bucket, 10000 req/min
Burst: 100 tokens/sec refill, bucket 200
BACKPRESSURE:
Bounded queues (reject when full)
Load shedding priority:
health > auth > critical > standard > background
Circuit breaker on failing dependencies
Response: 429 + Retry-After + X-RateLimit-* headers.
# Measure current utilization
kubectl top pods --sort-by=cpu
kubectl top nodes
# Project headroom
echo "At 70% CPU threshold, runway = ..."
1. Measure baseline under normal load
2. Project growth (3/6/12 month)
3. Identify ceiling per component
4. Calculate runway (when each hits 70%/90%)
5. Plan scaling actions with cost estimates
LAYERS:
CDN: static 24h, API 60s
Application: Redis, TTL-based
Query: in-process 30s
PATTERNS: Cache-aside | Write-through | Write-behind
Target >95% hit rate
Stampede: locking, probabilistic early refresh
ls docker-compose.yml k8s/ terraform/ 2>/dev/null
grep -r "pgbouncer\|ProxySQL\|redis\|memcached" . \
--include="*.yml" --include="*.yaml" -l 2>/dev/null
kubectl get hpa 2>/dev/null
FOR EACH bottleneck:
1. Measure -> 2. Choose strategy -> 3. Implement
4. Load test at 2x peak -> 5. Measure improvement
IF improvement < 20%: wrong bottleneck, re-measure
IF improvement >= 20%: commit
IF cost exceeds budget: optimize first
Log to .godmode/scale-results.tsv:
timestamp\tsystem\tcurrent_rps\ttarget_rps\tbottleneck\tdirection\tcost_delta_pct\tverdict
Print: Scale: {bottleneck}. Strategy: {horizontal|vertical|caching}. Capacity: {before} -> {after}. Cost: {est}. Status: {DONE|PARTIAL}.
KEEP if: capacity improved AND latency maintained
AND cost within budget
DISCARD if: latency regressed OR cost exceeds budget
OR auto-scaler flapping. Revert on discard.
STOP when ALL of:
- Target capacity met with acceptable latency
- Auto-scaling configured and tested
- Cost within budget with headroom for spikes
development
Web performance optimization. Lighthouse, bundle analysis, code splitting, image optimization, critical CSS, fonts, service workers, CDN.
development
Webhook design, delivery, retry, HMAC verification, event subscriptions, dead letter queues.
development
Vue.js mastery. Composition API, Pinia, Vue Router, Nuxt SSR/SSG, Vite optimization, testing.
development
Evidence gate. Run command, read full output, confirm or deny claim. No trust, only proof.