java/src/main/resources/targets/claude/skills/knowledge-packs/resilience/SKILL.md
Resilience patterns: circuit breaker, rate limiting, bulkhead isolation, timeout control, retry with exponential backoff + jitter, fallback/graceful degradation, backpressure, and resilience metrics.
npx skillsauth add edercnj/ia-dev-environment resilienceInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Provides comprehensive resilience patterns for {{LANGUAGE}} {{FRAMEWORK}}, enabling services to gracefully handle failures, prevent cascading outages, and recover quickly. Includes circuit breaker state machines, rate limiting strategies, bulkhead partitioning, timeout coordination, intelligent retry logic, and degradation strategies.
See references/resilience-principles.md for the essential resilience summary (6 core patterns: rate limiting, circuit breaker, bulkhead, timeout, retry, fallback).
Read these files for detailed pattern implementations:
| Reference | Content |
|-----------|---------|
| patterns/resilience/circuit-breaker.md | State machine (CLOSED/OPEN/HALF_OPEN), configuration (failure threshold, wait duration, success threshold), monitoring metrics, fallback strategies, per-dependency circuits |
| patterns/resilience/rate-limiting.md | Token bucket, fixed window, sliding window algorithms; per-client, per-endpoint, global scopes; token bucket properties (capacity, refill rate, burst); response to limit (429 with Retry-After) |
| patterns/resilience/bulkhead.md | Thread pool isolation, semaphore isolation, partitioning strategies (by downstream service, operation type, tenant, protocol); sizing guidelines; rejection handling; metrics and monitoring |
| patterns/resilience/timeout-patterns.md | Timeout types (connection, read, write, overall); per-operation configurations; deadline propagation across services; timeout hierarchy (inner < outer); cancellation on timeout |
| patterns/resilience/retry-with-backoff.md | Exponential backoff, linear backoff; mandatory jitter (full, equal, decorrelated); retryable vs non-retryable error classification; retry budgets; interaction with deadlines |
| patterns/resilience/fallback-degradation.md | Graceful degradation levels (NORMAL, WARNING, CRITICAL, EMERGENCY); fallback strategies (cached data, default, error); fail-secure principle; degradation triggers and transitions |
| patterns/resilience/backpressure.md | Flow control mechanisms, pause/resume protocols, connection-level backpressure, message queue depth limits, timeout-based resumption |
| patterns/resilience/resilience-metrics.md | Metric types per pattern, naming conventions, alert thresholds, dashboards, SLA tracking |
| references/chaos-engineering-experiments.md | Catalog of chaos experiments by type (network, latency, resource, dependency) with setup instructions |
Proactive resilience validation through controlled fault injection for {{LANGUAGE}} {{FRAMEWORK}}.
| Tool | Scope | Use Case | |------|-------|----------| | Chaos Monkey (Netflix) | Instance | Random instance termination in production | | Litmus | Kubernetes | K8s-native chaos experiments with CRDs | | Gremlin | SaaS | Enterprise chaos platform with safety controls | | Toxiproxy | Network | TCP-level proxy for latency/partition injection | | Chaos Mesh | Kubernetes | K8s chaos with dashboard and scheduling |
## Experiment: [Name]
### Hypothesis
[If X failure occurs, then Y behavior is expected because Z resilience pattern is in place]
### Steady-State Metrics
- Latency p99: [baseline]
- Error rate: [baseline]
- Throughput: [baseline]
### Experiment Steps
1. [Step with specific tool/command]
2. [Observation window]
3. [Rollback trigger conditions]
### Expected vs Actual Results
| Metric | Expected | Actual | Pass/Fail |
|--------|----------|--------|-----------|
### Findings
[Document unexpected behaviors]
### Action Items
- [ ] [Fix/improvement with owner and deadline]
skills/observability/ — resilience metrics, alerting, and SLO/SLI frameworkskills/infrastructure/ — health probes and graceful shutdown patternsskills/sre-practices/ — error budgets, incident management, and change managementtesting
Scaffolds a Helidon SE/MP service with routing, health, config, Dockerfile, and tests.
tools
Generates a Picocli @Command with subcommands, options, converters, and unit tests.
testing
Scaffolds a Micronaut service with @Controller, DI, health, Dockerfile, and tests.
testing
Scaffolds a Helidon SE/MP service with routing, health, config, Dockerfile, and tests.