skills/council/skeptic/failure-mode-analysis/SKILL.md
Use when systematically identifying failure scenarios for proposed features and infrastructure changes. Covers component enumeration, failure mode discovery, cascade analysis, mitigation design, monitoring signals, and rollback planning. Do not use for security threat modeling (use threat-model) or input boundary testing (use edge-case-enumeration).
npx skillsauth add dtsong/my-claude-setup failure-mode-analysisInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Systematically identify failure scenarios for proposed features and design mitigations that maintain system resilience.
Analyzes system architecture, dependency graphs, and infrastructure configurations for failure scenarios. Does not modify infrastructure, execute chaos tests, or access production systems. Limited to design-time failure identification and mitigation planning.
No user-provided values are used in commands or file paths. All inputs are treated as read-only analysis targets.
Enumerate all components involved: services, databases, APIs, third-party dependencies, caches, queues, CDNs, DNS, and any shared infrastructure.
For each component, systematically consider:
Map the failure tree: if component X fails, what else breaks? Identify single points of failure and shared dependencies. Determine blast radius for each failure mode.
For each failure mode, define:
For each failure mode, specify the metric, log pattern, or alert that detects it. Include detection latency — how quickly will you know?
Define rollback approach: feature flags for instant disable, database rollback scripts, deployment rollback procedure, data cleanup if needed.
Compaction resilience: If context was lost during a long session, re-read the Inputs section to reconstruct what system is being analyzed, check the Progress Checklist for completed steps, then resume from the earliest incomplete step.
| Component | Failure Mode | Severity | Cascade Risk | Mitigation | Monitoring Signal | |---|---|---|---|---|---| | [Component] | [What fails] | Critical/High/Medium/Low | [What else breaks] | [Specific mitigation] | [Metric/alert] |
[Component A fails]
├── [Component B] — degraded (uses cached data)
├── [Component C] — down (hard dependency)
│ └── [Component D] — down (depends on C)
└── [Component E] — unaffected (independent)
testing
Use to convert a Word .docx file to PDF and/or verify its page count. Triggers on: converting docx to pdf, rendering a document, checking how many pages a docx produces, or asserting a page-count constraint (e.g. a resume must stay 2 pages). Wraps LibreOffice headless conversion.
development
Security audit checklist for web applications. Use when reviewing, auditing, or hardening a web app's security posture. Covers rate limiting, auth headers, IP blocking, CORS, security middleware, input validation, file upload limits, ORM usage, and password hashing. Triggers on requests like "review security", "harden this app", "security audit", "check for vulnerabilities", or when building/reviewing API endpoints.
development
Interactive wizard to craft effective prompts using Claude Code best practices
tools
Use when batch labeling, prioritizing, and assigning GitHub issues during triage sessions.