stress-test/skills/stress-test-methodology/SKILL.md
This skill should be used when the user wants to stress-test a plan, review a plan for gaps, challenge assumptions in a planning document, run adversarial review, apply red-team/blue-team analysis to a plan, or asks 'is my plan sound', 'what am I missing', 'what could go wrong'. Covers the adversarial what-if methodology, verdict system, tool scope selection, and how to interpret stress test results.
npx skillsauth add oborchers/fractional-cto stress-test-methodologyInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Adversarial plan review uses two independent agents with different roles to stress-test a planning document:
Red team (adversarial) -- reads the plan and surrounding artifacts, generates what-if questions targeting gaps, unverified assumptions, edge cases, and failure modes. Operates on local artifacts only to keep questions grounded.
Blue team (neutral analyst) -- receives the what-if questions and attempts to answer each using the plan, artifacts, and a configurable set of tools. Classifies each answer with a verdict.
The technique works because the agents have separate contexts: the red team generates challenges without knowing the answers, and the blue team answers without knowing which questions are easy or hard. Neither agent is biased toward defending or attacking the plan.
The red team generates what-if questions strictly tied to plan artifacts. Every question references a specific file, config value, dependency, or code path found in the surrounding context. This mode finds gaps in what the plan explicitly covers.
Best for: implementation plans with a full codebase, detailed technical designs with supporting configs.
Inspired by Barry O'Reilly's Residuality Theory, creative mode adds extreme, cross-domain stressors that go beyond artifact grounding. Instead of only asking "what does the plan miss?", it asks "what survives when extreme stress hits?"
Additional stressor categories:
Best for: strategic plans, plans with external dependencies, plans where hidden coupling matters more than line-level coverage.
Runs grounded questions first (artifact-tied), then creative questions (extreme/cross-domain). Most thorough. The QA report separates grounded and creative sections so you can triage accordingly.
The blue team classifies each answer:
| Verdict | Meaning | Action | |---------|---------|--------| | ANSWERED | Plan or artifacts explicitly address the concern, with a quotable reference | No action needed | | PARTIALLY ADDRESSED | Some coverage exists but gaps remain | Strengthen the relevant plan section | | NOT COVERED | The plan has no answer -- genuine gap | Add coverage to the plan | | UNCERTAIN | Cannot determine with available tools -- gap might or might not exist | Expand tool scope or investigate manually |
Focus your iteration on NOT COVERED and UNCERTAIN items. These are the plan's blind spots.
The blue team's tool scope determines how thoroughly it can verify claims:
Local artifacts only (Read, Grep, Glob)
+ Web research (adds WebSearch, WebFetch)
+ System verification (adds Bash, MCP tools)
The red team always uses local artifacts only, regardless of scope selection.
A healthy stress test typically shows:
If most questions are ANSWERED, the plan is solid. If most are NOT COVERED, the plan needs significant revision before proceeding.
Watch for false confidence: an ANSWERED verdict is only as good as the evidence behind it. Check that ANSWERED items include specific references (plan sections, code paths, config values), not vague reassurances.
Use the /stress-test:stress-test command:
/stress-test:stress-test path/to/plan.md
The command orchestrates the full flow: reads the plan, asks about tool scope, dispatches the red team, then the blue team, and presents a summary with action items.
tools
This skill should be used when the user invokes any /plan-* command from the planning-tools plugin (/plan-context, /plan-master, /plan-open-questions, /plan-verify, /plan-tick, /plan-progress, /plan-delete), asks how Claude Code's plan files work, asks where plans are stored, asks to author or audit a multi-phase master planning document, asks how to walk through a plan's Open Questions interactively, asks how to write progress entries, or mentions ~/.claude/plans/ or .claude/planning-tools.local.md. Provides the index of planning-tools commands, the master-plan workflow lifecycle, the v0.3.0+ list-shape mandate (phases and questions as headings + bulleted scope items, never tables), the v0.3.2+ plain-bullet shape (no `- [ ]` checkboxes — heading emoji is the sole tick signal), the progress-entry methodology, and the mechanics of Claude Code's plan-mode file storage.
testing
This skill should be used by the plan-verifier agent and the /plan-verify command to audit a drafted master plan against a fixed checklist. Covers universal-core completeness, the v0.3.0+ no-tables-for-phases-or-questions rule, trigger-based section-coverage gaps, phase actionability (heading + per-phase TL;DR + bulleted scope + exit criteria), the v0.3.1+ per-phase TL;DR requirement, the v0.3.2+ plain-bullet scope shape (legacy `- [ ]`/`- [x]` accepted silently), the v0.3.3+ context-block shape (plan-level `**TL;DR:**` + bulleted metadata, legacy `>` blockquote accepted silently), integer phase numbering enforcement, dependency traceability, citation resolution, callout/evidence convention compliance, Open Questions placement, and the one-PR-per-master-plan rule. Single-owner of the audit checklist.
tools
This skill should be used when authoring, reviewing, or modifying a multi-phase master planning document via the planning-tools plugin (especially the /plan-master and /plan-verify commands). Codifies the universal core sections, trigger-based optional sections, integer-only phase numbering, Open Questions placement, one-PR-per-plan rule, status conventions, evidence attribution, callouts, cross-reference formats, the v0.3.0 list-shape mandate (phases and questions are heading + bulleted list, never markdown tables), the v0.3.1 per-phase TL;DR requirement (1–3 sentence what/why summary under each phase heading for glance-ability), the v0.3.2 plain-bullet scope shape (`- <action>` items, no `- [ ]` checkboxes — the phase status emoji is the sole tick signal), and the v0.3.3 context-block shape (a plan-level `**TL;DR:**` + a bulleted metadata list instead of a `>` blockquote; legacy blockquote blocks accepted silently). Project-agnostic — no ticket-prefix or plan-type taxonomy.
testing
This skill should be used when the user is adjusting spacing, padding, margins, content density, section gaps, vertical rhythm, or separation between elements. Also applies when reviewing whether a design feels cramped or too sparse, choosing between borders and whitespace for separation, or defining a spacing system. Covers the 4px/8px spacing system, macro vs micro whitespace, content density spectrum, separation techniques (whitespace > background shifts > borders), and vertical rhythm.