Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

oborchers/stress-test-methodology

Name: stress-test-methodology
Author: oborchers

stress-test/skills/stress-test-methodology/SKILL.md

npx skillsauth add oborchers/fractional-cto stress-test-methodology

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Adversarial Plan Review

Adversarial plan review uses two independent agents with different roles to stress-test a planning document:

Red team (adversarial) -- reads the plan and surrounding artifacts, generates what-if questions targeting gaps, unverified assumptions, edge cases, and failure modes. Operates on local artifacts only to keep questions grounded.
Blue team (neutral analyst) -- receives the what-if questions and attempts to answer each using the plan, artifacts, and a configurable set of tools. Classifies each answer with a verdict.

The technique works because the agents have separate contexts: the red team generates challenges without knowing the answers, and the blue team answers without knowing which questions are easy or hard. Neither agent is biased toward defending or attacking the plan.

When to Use

Before committing to an implementation plan
Before presenting a business plan or proposal to stakeholders
After drafting a design document and before sharing it for review
When you feel a plan is "done" but want to pressure-test it
When surrounding context is rich enough to ground the analysis (codebase, supporting docs, config files)

Stressor Modes

Grounded Mode

The red team generates what-if questions strictly tied to plan artifacts. Every question references a specific file, config value, dependency, or code path found in the surrounding context. This mode finds gaps in what the plan explicitly covers.

Best for: implementation plans with a full codebase, detailed technical designs with supporting configs.

Creative Mode (Residuality Theory)

Inspired by Barry O'Reilly's Residuality Theory, creative mode adds extreme, cross-domain stressors that go beyond artifact grounding. Instead of only asking "what does the plan miss?", it asks "what survives when extreme stress hits?"

Additional stressor categories:

Extreme scale (10x/100x assumptions)
Dependency disappearance (vendor, library, team member gone)
Regulatory/compliance shifts
Team/organizational disruption
Market/competitive shocks
Graceful degradation analysis ("what survives?" not just "what breaks?")

Best for: strategic plans, plans with external dependencies, plans where hidden coupling matters more than line-level coverage.

Both

Runs grounded questions first (artifact-tied), then creative questions (extreme/cross-domain). Most thorough. The QA report separates grounded and creative sections so you can triage accordingly.

The Verdict System

The blue team classifies each answer:

| Verdict | Meaning | Action | |---------|---------|--------| | ANSWERED | Plan or artifacts explicitly address the concern, with a quotable reference | No action needed | | PARTIALLY ADDRESSED | Some coverage exists but gaps remain | Strengthen the relevant plan section | | NOT COVERED | The plan has no answer -- genuine gap | Add coverage to the plan | | UNCERTAIN | Cannot determine with available tools -- gap might or might not exist | Expand tool scope or investigate manually |

Focus your iteration on NOT COVERED and UNCERTAIN items. These are the plan's blind spots.

Tool Scope

The blue team's tool scope determines how thoroughly it can verify claims:

Local artifacts only (Read, Grep, Glob)

Best when: rich surrounding context (full codebase, detailed docs)
Limitation: cannot verify claims about external systems, standards, or APIs
UNCERTAIN verdicts may indicate the plan references things outside the artifacts

+ Web research (adds WebSearch, WebFetch)

Best when: plan references external APIs, standards, benchmarks, or third-party services
Allows the blue team to check documentation, specs, and industry practices
Reduces UNCERTAIN verdicts for external dependency questions

+ System verification (adds Bash, MCP tools)

Best when: plan makes assumptions about live systems (API responses, config values, resource limits)
The blue team can query actual APIs, check live configurations, run diagnostics
Most powerful -- turns assumptions into verified facts or confirmed gaps
Requires the referenced systems to be accessible from the current environment

The red team always uses local artifacts only, regardless of scope selection.

Interpreting Results

A healthy stress test typically shows:

40-60% ANSWERED -- the plan covers many concerns
10-20% PARTIALLY ADDRESSED -- some areas need strengthening
10-20% NOT COVERED -- genuine gaps to fill
5-15% UNCERTAIN -- areas needing more investigation

If most questions are ANSWERED, the plan is solid. If most are NOT COVERED, the plan needs significant revision before proceeding.

Watch for false confidence: an ANSWERED verdict is only as good as the evidence behind it. Check that ANSWERED items include specific references (plan sections, code paths, config values), not vague reassurances.

Limitations

The red team can only challenge what it can see. If critical context is in a separate repo, a Confluence page, or someone's head, the red team cannot generate questions about it.
The blue team's ANSWERED verdicts depend on its tool scope. A local-only blue team marking something ANSWERED based on a code comment is weaker than a system-verification blue team confirming it against a live API.
Neither agent understands organizational context (team capacity, political constraints, budget). These factors affect plan feasibility but are invisible to the agents.
Creative mode questions are not grounded in artifacts. The blue team may mark many as UNCERTAIN or NOT COVERED simply because the plan was never designed to address extreme scenarios. This is expected -- the value is in identifying which extreme scenarios the plan should address.

Running a Stress Test

Use the /stress-test:stress-test command:

/stress-test:stress-test path/to/plan.md

The command orchestrates the full flow: reads the plan, asks about tool scope, dispatches the red team, then the blue team, and presents a summary with action items.

oborchers/stress-test-methodology

stress-test/skills/stress-test-methodology/SKILL.md

This skill should be used when the user wants to stress-test a plan, review a plan for gaps, challenge assumptions in a planning document, run adversarial review, apply red-team/blue-team analysis to a plan, or asks 'is my plan sound', 'what am I missing', 'what could go wrong'. Covers the adversarial what-if methodology, verdict system, tool scope selection, and how to interpret stress test results.

10 stars

tools

Updated May 13, 2026

$ install --global

skillsauth

npx skillsauth add oborchers/fractional-cto stress-test-methodology

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 13, 2026, 6:53 AM169.5s1 file scanned

SKILL.md

name:: stress-test-methodology
description:: This skill should be used when the user wants to stress-test a plan, review a plan for gaps, challenge assumptions in a planning document, run adversarial review, apply red-team/blue-team analysis to a plan, or asks 'is my plan sound', 'what am I missing', 'what could go wrong'. Covers the adversarial what-if methodology, verdict system, tool scope selection, and how to interpret stress test results.
version:: 0.2.0

Adversarial Plan Review

Adversarial plan review uses two independent agents with different roles to stress-test a planning document:

Red team (adversarial) -- reads the plan and surrounding artifacts, generates what-if questions targeting gaps, unverified assumptions, edge cases, and failure modes. Operates on local artifacts only to keep questions grounded.
Blue team (neutral analyst) -- receives the what-if questions and attempts to answer each using the plan, artifacts, and a configurable set of tools. Classifies each answer with a verdict.

When to Use

Before committing to an implementation plan
Before presenting a business plan or proposal to stakeholders
After drafting a design document and before sharing it for review
When you feel a plan is "done" but want to pressure-test it
When surrounding context is rich enough to ground the analysis (codebase, supporting docs, config files)

Stressor Modes

Grounded Mode

Best for: implementation plans with a full codebase, detailed technical designs with supporting configs.

Creative Mode (Residuality Theory)

Additional stressor categories:

Extreme scale (10x/100x assumptions)
Dependency disappearance (vendor, library, team member gone)
Regulatory/compliance shifts
Team/organizational disruption
Market/competitive shocks
Graceful degradation analysis ("what survives?" not just "what breaks?")

Best for: strategic plans, plans with external dependencies, plans where hidden coupling matters more than line-level coverage.

Both

Runs grounded questions first (artifact-tied), then creative questions (extreme/cross-domain). Most thorough. The QA report separates grounded and creative sections so you can triage accordingly.

The Verdict System

The blue team classifies each answer:

Focus your iteration on NOT COVERED and UNCERTAIN items. These are the plan's blind spots.

Tool Scope

The blue team's tool scope determines how thoroughly it can verify claims:

Local artifacts only (Read, Grep, Glob)

Best when: rich surrounding context (full codebase, detailed docs)
Limitation: cannot verify claims about external systems, standards, or APIs
UNCERTAIN verdicts may indicate the plan references things outside the artifacts

+ Web research (adds WebSearch, WebFetch)

Best when: plan references external APIs, standards, benchmarks, or third-party services
Allows the blue team to check documentation, specs, and industry practices
Reduces UNCERTAIN verdicts for external dependency questions

+ System verification (adds Bash, MCP tools)

Best when: plan makes assumptions about live systems (API responses, config values, resource limits)
The blue team can query actual APIs, check live configurations, run diagnostics
Most powerful -- turns assumptions into verified facts or confirmed gaps
Requires the referenced systems to be accessible from the current environment

The red team always uses local artifacts only, regardless of scope selection.

Interpreting Results

A healthy stress test typically shows:

40-60% ANSWERED -- the plan covers many concerns
10-20% PARTIALLY ADDRESSED -- some areas need strengthening
10-20% NOT COVERED -- genuine gaps to fill
5-15% UNCERTAIN -- areas needing more investigation

If most questions are ANSWERED, the plan is solid. If most are NOT COVERED, the plan needs significant revision before proceeding.

Limitations

The red team can only challenge what it can see. If critical context is in a separate repo, a Confluence page, or someone's head, the red team cannot generate questions about it.
The blue team's ANSWERED verdicts depend on its tool scope. A local-only blue team marking something ANSWERED based on a code comment is weaker than a system-verification blue team confirming it against a live API.
Neither agent understands organizational context (team capacity, political constraints, budget). These factors affect plan feasibility but are invisible to the agents.
Creative mode questions are not grounded in artifacts. The blue team may mark many as UNCERTAIN or NOT COVERED simply because the plan was never designed to address extreme scenarios. This is expected -- the value is in identifying which extreme scenarios the plan should address.

Running a Stress Test

Use the /stress-test:stress-test command:

/stress-test:stress-test path/to/plan.md

The command orchestrates the full flow: reads the plan, asks about tool scope, dispatches the red team, then the blue team, and presents a summary with action items.

Related Skills

oborchers/using-planning-tools

tools

VerifiedTrustedCommunity

This skill should be used when the user invokes any /plan-* command from the planning-tools plugin (/plan-context, /plan-master, /plan-open-questions, /plan-verify, /plan-tick, /plan-progress, /plan-delete), asks how Claude Code's plan files work, asks where plans are stored, asks to author or audit a multi-phase master planning document, asks how to walk through a plan's Open Questions interactively, asks how to write progress entries, or mentions ~/.claude/plans/ or .claude/planning-tools.local.md. Provides the index of planning-tools commands, the master-plan workflow lifecycle, the v0.3.0+ list-shape mandate (phases and questions as headings + bulleted scope items, never tables), the v0.3.2+ plain-bullet shape (no `- [ ]` checkboxes — heading emoji is the sole tick signal), the progress-entry methodology, and the mechanics of Claude Code's plan-mode file storage.

15SKILL.mdUpdated May 13, 2026

oborchers/using-planning-tools

oborchers/plan-verification-checklist

testing

VerifiedTrustedCommunity

This skill should be used by the plan-verifier agent and the /plan-verify command to audit a drafted master plan against a fixed checklist. Covers universal-core completeness, the v0.3.0+ no-tables-for-phases-or-questions rule, trigger-based section-coverage gaps, phase actionability (heading + per-phase TL;DR + bulleted scope + exit criteria), the v0.3.1+ per-phase TL;DR requirement, the v0.3.2+ plain-bullet scope shape (legacy `- [ ]`/`- [x]` accepted silently), the v0.3.3+ context-block shape (plan-level `**TL;DR:**` + bulleted metadata, legacy `>` blockquote accepted silently), integer phase numbering enforcement, dependency traceability, citation resolution, callout/evidence convention compliance, Open Questions placement, and the one-PR-per-master-plan rule. Single-owner of the audit checklist.

15SKILL.mdUpdated May 13, 2026

oborchers/plan-verification-checklist

oborchers/master-plan-methodology

tools

VerifiedTrustedCommunity

This skill should be used when authoring, reviewing, or modifying a multi-phase master planning document via the planning-tools plugin (especially the /plan-master and /plan-verify commands). Codifies the universal core sections, trigger-based optional sections, integer-only phase numbering, Open Questions placement, one-PR-per-plan rule, status conventions, evidence attribution, callouts, cross-reference formats, the v0.3.0 list-shape mandate (phases and questions are heading + bulleted list, never markdown tables), the v0.3.1 per-phase TL;DR requirement (1–3 sentence what/why summary under each phase heading for glance-ability), the v0.3.2 plain-bullet scope shape (`- <action>` items, no `- [ ]` checkboxes — the phase status emoji is the sole tick signal), and the v0.3.3 context-block shape (a plan-level `**TL;DR:**` + a bulleted metadata list instead of a `>` blockquote; legacy blockquote blocks accepted silently). Project-agnostic — no ticket-prefix or plan-type taxonomy.

15SKILL.mdUpdated May 13, 2026

oborchers/master-plan-methodology

oborchers/whitespace-density

testing

VerifiedTrustedCommunity

This skill should be used when the user is adjusting spacing, padding, margins, content density, section gaps, vertical rhythm, or separation between elements. Also applies when reviewing whether a design feels cramped or too sparse, choosing between borders and whitespace for separation, or defining a spacing system. Covers the 4px/8px spacing system, macro vs micro whitespace, content density spectrum, separation techniques (whitespace > background shifts > borders), and vertical rhythm.

12SKILL.mdUpdated May 22, 2026

oborchers/whitespace-density

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/oborchers/fractional-cto.git

# Copy into Claude Code skills folder (global)
cp -r fractional-cto/stress-test/skills/stress-test-methodology ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

oborchers/fractional-cto

10 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT