skills/agent-load-balancer/SKILL.md
Use this skill when designing load balancing or traffic distribution for AI agents, including routing policies, reliability targets, latency control, throughput scaling, and demand-aware failover.
npx skillsauth add chatandbuild/skills-repo Agent Load BalancerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Plan and validate traffic distribution patterns for resilient agent operations.
Weighted round-robin. Distribute requests in proportion to backend capacity. Backend A gets 70%, B gets 30%. Use when backends have different capacity or you are doing gradual rollout. Update weights incrementally; avoid sudden 100% shifts.
Least-connections. Route to the backend with the fewest active connections. Suited for long-lived agent sessions where request duration varies. Requires accurate connection counting; ensure connections are properly tracked and released.
Latency-based routing. Route to the backend with lowest recent latency. Use percentile latency (e.g., p95) over a sliding window. Protects users from slow or degraded backends. Requires low-overhead latency measurement and a fallback when metrics are stale.
Consistent hashing for session affinity. When a user session must stick to one backend (e.g., for stateful context), use consistent hashing on session ID. Ensures the same backend handles follow-up requests. Balance with failover: if the backend is unhealthy, break affinity and route elsewhere.
Active vs passive. Active checks: periodic probes (HTTP, gRPC) to each backend. Passive checks: observe success/failure of real traffic. Use both: active for fast detection of dead backends, passive for detecting degradation under load. Tune frequency to avoid overwhelming backends.
Failure threshold tuning. Require N consecutive failures before marking unhealthy. Too low: transient blips cause unnecessary failover. Too high: users hit a bad backend for too long. Typical: 2ΓÇô3 failures over 10ΓÇô30 seconds. Document and test the chosen values.
Circuit breaker integration. When a backend is unhealthy, open the circuit: stop sending traffic for a cooldown period. After cooldown, send a probe (half-open). If it succeeds, close the circuit. Prevents cascade failures and gives the backend time to recover.
No health checks causing traffic to dead backends. Without health checks, the load balancer keeps routing to backends that are down or unresponsive. Always configure health checks. Validate that they run and that unhealthy backends are removed from the pool.
Sticky sessions preventing failover. Session affinity keeps traffic on one backend. If that backend fails, users with affinity to it get errors until the session expires. Implement failover: when the preferred backend is unhealthy, break affinity and route to a healthy one.
Uneven weight distribution. Setting weights that don't match capacity (e.g., 50/50 when one backend has 2x capacity) causes overload on the weaker backend. Base weights on measured capacity or start conservative and adjust from metrics.
Missing circuit breaker causing cascade failures. Sending continuous traffic to a failing backend can exhaust timeouts and threads, affecting the whole system. Use a circuit breaker to stop traffic to failing backends and allow recovery.
## Routing Strategy
- Primary backend(s): <list with capacity>
- Policy: <weighted round-robin | least-connections | latency-based | consistent hashing>
- Weights (if applicable): <backend: weight>
- Session affinity: <none | consistent hash on X>
## Health and Failover
- Active checks: <protocol, path, interval, timeout>
- Passive checks: <observe success/failure, window>
- Failure threshold: <N failures in M seconds>
- Circuit breaker: <enabled, cooldown, half-open probe>
- Failover: <break affinity when unhealthy: yes|no>
## Rollout Plan
- Stage 1: <traffic split, duration>
- Stage 2: <traffic split, duration>
- Rollback trigger: <error rate > X% | latency p95 > Yms | manual>
- Rollback action: <revert weights, disable new backend>
## Validation
- [ ] Health checks remove unhealthy backends
- [ ] Failover works when primary is down
- [ ] Weights match capacity under load
- [ ] Circuit breaker opens and recovers correctly
documentation
Create beautiful visual art in .png and .pdf documents using design philosophy. You should use this skill when the user asks to create a poster, piece of art, design, or other static piece. Create original visual designs, never copying existing artists' work to avoid copyright violations.
development
Creating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.
devops
Deploy applications and infrastructure to Cloudflare using Workers, Pages, and related platform services. Use when the user asks to deploy, host, publish, or set up a project on Cloudflare.
tools
Use this skill when designing and building durable command-line tools from API docs, OpenAPI specs, SDKs, curl examples, admin tools, web apps, or local scripts, especially when the CLI should expose composable commands, stable JSON output, auth/config handling, install-on-PATH behavior, and a companion skill.