skills/reverse-proxy-for-agents/SKILL.md
Reverse proxy architecture for AI agent systems — Nginx, Caddy, Traefik, and Cloudflare. Agent-specific patterns: routing to specialist agents, load balancing, TLS termination, rate limiting, WebSocket proxying, header injection for agent identity, and the "agent gateway" pattern. Covers LLM routers (LiteLLM, OpenRouter) as intelligent reverse proxies, service mesh (Envoy/Istio) for agent fleets, and MCP servers behind reverse proxies. Activate on 'reverse proxy', 'nginx proxy', 'caddy', 'traefik', 'agent gateway', 'LLM router', 'LiteLLM proxy', 'WebSocket proxy', 'service mesh agents'. NOT for: tunnels (use tunnels-for-agents), firewall rules (use agentic-zero-trust-security), container orchestration (use devops-automator), DNS configuration (use infrastructure skills).
npx skillsauth add curiositech/windags-skills reverse-proxy-for-agentsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Expert in reverse proxy architecture for AI agent systems, specializing in routing, load balancing, TLS termination, and real-time communication patterns.
Given requirements, choose proxy tool:
INPUT: (throughput, k8s_env, auto_reload, simplicity)
IF throughput > 10k_rps:
→ Use Nginx (C implementation, event-driven)
→ Configure keepalive, worker processes
→ Manual config reload with nginx -s reload
ELIF k8s_env == true AND auto_reload == true:
→ Use Traefik (Docker/K8s service discovery)
→ Configure via container labels
→ Automatic routing updates on container changes
ELIF simplicity == true AND throughput < 5k_rps:
→ Use Caddy (automatic HTTPS, zero-config WebSocket)
→ Caddyfile syntax, hot reload via API
→ Built-in Let's Encrypt integration
ELIF edge_protection == true:
→ Use Cloudflare (DDoS, WAF, CDN at edge)
→ Dashboard config, instant propagation
→ DNS-based routing
REQUEST ANALYSIS:
IF agent_fleet_size < 5:
→ Path-based routing: /api/code/* → code-agent
→ Manual upstream configuration
→ Health checks every 30s
ELIF agents_are_stateful == true:
→ Cookie-based sticky sessions
→ lb_policy cookie agent_session (Caddy)
→ ip_hash (Nginx) for IP-based affinity
ELIF real_time_required == true:
→ WebSocket proxying configuration
→ proxy_buffering off (Nginx)
→ Long timeouts: proxy_read_timeout 86400s
→ Connection upgrade headers
ELIF content_based_routing == true:
→ Implement lightweight classifier at proxy
→ Keyword matching or fast LLM (Haiku-class)
→ Route based on request body analysis
CONSTRAINTS ANALYSIS:
IF budget_control == critical:
→ LiteLLM (self-hosted, full cost tracking)
→ max_budget configuration
→ Per-user spending limits
ELIF model_variety > 100:
→ OpenRouter (500+ models, provider fallback)
→ Cloud service, per-token markup
→ Built-in latency-based routing
ELIF observability == priority:
→ Helicone (request tracing, analytics)
→ Pass-through proxy model
→ Cost per request tracking
Symptom Detection:
Root Cause: Proxy sending traffic to crashed/unresponsive agents Fix Strategy:
health_uri /health (Caddy) or max_fails=2 fail_timeout=60s (Nginx)Symptom Detection:
Root Cause: Proxy buffering streaming responses before forwarding Fix Strategy:
proxy_buffering off for all streaming endpointsproxy_cache off and chunked_transfer_encoding off for SSEX-Accel-Buffering: no header for dynamic buffering controlSymptom Detection:
Root Cause: Default timeouts killing long-running agent requests Fix Strategy:
proxy_read_timeout 300s (Nginx) or timeout 300s (Caddy)Symptom Detection:
Root Cause: Stateful agents losing context due to round-robin load balancing Fix Strategy:
lb_policy cookie agent_session (Caddy)ip_hash directive (Nginx) for IP-based affinitySymptom Detection:
Root Cause: Content-based routing using heavyweight classifier Fix Strategy:
Symptom Detection:
Root Cause: Single point of failure in proxy layer Fix Strategy:
Symptom Detection:
Root Cause: Missing or insufficient rate limiting allowing budget exhaustion Fix Strategy:
limit_req_zone $binary_remote_addr (Nginx)rate_limit zone api_key (Caddy)nodelay for legitimate traffic spikesScenario: 3 specialist agents (code, research, data) need routing with auto-HTTPS and WebSocket support.
Agent Decision Process:
Step-by-Step Implementation:
# 1. Create Caddyfile with agent routing
cat > Caddyfile << 'EOF'
agents.example.com {
handle /api/code/* {
reverse_proxy code-agent:8001 {
health_uri /health
health_interval 10s
}
}
handle /api/research/* {
reverse_proxy research-agent:8002 {
health_uri /health
health_interval 10s
}
}
handle /ws/* {
reverse_proxy agent-ws:8010
# Caddy handles WebSocket upgrade automatically
}
# Rate limiting per client
rate_limit {
zone agent_api {
key {remote_host}
events 100
window 1m
}
}
# Agent identity headers
header_up X-Request-ID {http.request.uuid}
header_up X-Gateway "windags-caddy"
}
EOF
# 2. Start Caddy (gets certificates automatically)
caddy run
# 3. Test routing and health
curl https://agents.example.com/api/code/health
curl -H "Upgrade: websocket" https://agents.example.com/ws/connect
Novice Mistakes vs Expert Choices:
Configuration Validation:
# Test health check behavior
docker stop code-agent
curl -v https://agents.example.com/api/code/test
# Should return 502 after health check fails
# Test WebSocket upgrade
wscat -c wss://agents.example.com/ws/stream
# Should successfully upgrade connection
# Test rate limiting
for i in {1..110}; do curl https://agents.example.com/api/code/health; done
# Should get 429 Too Many Requests after 100 requests
curl -v https://)proxy_buffering off or equivalent)Use reverse-proxy-for-agents for:
Do NOT use reverse-proxy-for-agents for:
tunnels-for-agentstunnels-for-agentsagentic-zero-trust-securitydevops-automatorprompt-engineeripc-communication-patternstools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.