.claude/skills/qe-chaos-resilience/SKILL.md
Injects controlled faults (network partition, latency, process kill, disk pressure) into distributed systems and validates recovery behavior. Use when testing circuit breakers, failover paths, retry logic, or building confidence in system resilience through chaos engineering.
npx skillsauth add proffesor-for-testing/agentic-qe qe-chaos-resilienceInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Guide the use of v3's chaos engineering capabilities including controlled fault injection, load/stress testing, resilience validation, and disaster recovery testing.
# Run chaos experiment
aqe chaos run --experiment network-latency --target api-service
# Load test
aqe chaos load --scenario peak-traffic --duration 30m
# Stress test to breaking point
aqe chaos stress --endpoint /api/users --max-users 10000
# Test circuit breaker
aqe chaos circuit-breaker --service payment-service
// Chaos experiment
Task("Run chaos experiment", `
Execute controlled chaos on api-service:
- Inject 500ms network latency
- Monitor service health metrics
- Verify circuit breaker activation
- Measure recovery time
- Document findings
`, "qe-chaos-engineer")
// Load testing
Task("Performance load test", `
Run load test simulating Black Friday traffic:
- Ramp up to 10,000 concurrent users
- Maintain load for 30 minutes
- Monitor response times and error rates
- Identify bottlenecks
- Compare against SLAs
`, "qe-load-tester")
await chaosEngineer.injectFault({
target: 'api-service',
fault: {
type: 'latency',
parameters: {
delay: '500ms',
jitter: '100ms',
percentage: 50
}
},
duration: '5m',
monitoring: {
metrics: ['response_time', 'error_rate', 'throughput'],
alerts: true
},
rollback: {
automatic: true,
trigger: 'error_rate > 10%'
}
});
await loadTester.execute({
scenario: 'peak-traffic',
profile: {
rampUp: '5m',
steadyState: '30m',
rampDown: '5m'
},
users: {
initial: 100,
target: 5000,
pattern: 'linear'
},
assertions: {
p95_latency: '<500ms',
error_rate: '<1%',
throughput: '>1000rps'
}
});
await loadTester.stressTest({
endpoint: '/api/checkout',
strategy: 'step-increase',
steps: [100, 500, 1000, 2000, 5000],
stepDuration: '5m',
findBreakingPoint: true,
monitoring: {
resourceUtilization: true,
databaseConnections: true,
memoryUsage: true
}
});
await resilienceTester.validate({
scenarios: [
'database-failover',
'cache-failure',
'external-service-timeout',
'pod-termination'
],
expectations: {
gracefulDegradation: true,
automaticRecovery: true,
dataIntegrity: true,
recoveryTime: '<30s'
}
});
| Fault | Description | Use Case | |-------|-------------|----------| | Latency | Add network delay | Test timeouts | | Packet Loss | Drop network packets | Test retry logic | | CPU Stress | Consume CPU | Test resource limits | | Memory Pressure | Consume memory | Test OOM handling | | Disk Full | Fill disk space | Test disk errors | | Process Kill | Terminate process | Test recovery |
interface ChaosReport {
experiment: {
name: string;
target: string;
fault: FaultConfig;
duration: number;
};
results: {
hypothesis: string;
validated: boolean;
metrics: {
before: MetricSnapshot;
during: MetricSnapshot;
after: MetricSnapshot;
};
events: ChaosEvent[];
recovery: {
detected: boolean;
time: number;
automatic: boolean;
};
};
findings: {
severity: 'critical' | 'high' | 'medium' | 'low';
description: string;
recommendation: string;
}[];
artifacts: {
logs: string;
metrics: string;
traces: string;
};
}
safety:
blast_radius:
max_affected_pods: 1
max_affected_percentage: 10
abort_conditions:
- error_rate > 50%
- p99_latency > 10s
- service_unavailable
excluded_environments:
- production-critical
required_approvals:
production: 2
staging: 0
await resilienceTester.validateSLA({
slas: {
availability: 99.9,
p95_latency: 500,
error_rate: 0.1
},
period: '30d',
report: {
breaches: true,
trends: true,
projections: true
}
});
Primary Agents: qe-chaos-engineer, qe-load-tester, qe-resilience-tester Coordinator: qe-chaos-coordinator Related Skills: qe-performance, security-testing
development
Apply XP practices including pair programming, ensemble programming, continuous integration, and sustainable pace. Use when implementing agile development practices, improving team collaboration, or adopting technical excellence practices.
development
Warehouse Management System testing patterns for inventory operations, pick/pack/ship workflows, wave management, EDI X12/EDIFACT compliance, RF/barcode scanning, and WMS-ERP integration. Use when testing WMS platforms (Blue Yonder, Manhattan, SAP EWM).
testing
Advanced visual regression testing with pixel-perfect comparison, AI-powered diff analysis, responsive design validation, and cross-browser visual consistency. Use when detecting UI regressions, validating designs, or ensuring visual consistency.
development
Comprehensive truth scoring, code quality verification, and automatic rollback system with 0.95 accuracy threshold for ensuring high-quality agent outputs and codebase reliability.