Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

bang9/whip-simulate

Name: whip-simulate
Author: bang9

whip/skills/whip-simulate/SKILL.md

npx skillsauth add bang9/ai-tools whip-simulate

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Run multi-agent simulations from a user-provided scenario. Concretize the scenario into test cases, spawn agents, and analyze output patterns for consistency.

Input

Extract from $ARGUMENTS:

Scenario: what to simulate, compare, or verify
--runs N: number of simulation runs (default: 5)
--agent: use inline mode (see Execution Mode below)

Execution Mode (mutually exclusive)

$ARGUMENTS determines which dispatch mode this skill uses. The two modes are mutually exclusive:

| Mode | Activates when | Dispatch mechanism | |------|---------------|-------------------| | Tracked (default) | --agent is absent from $ARGUMENTS | /whip-start Team Flow — IRC, workspace, polling | | Inline | --agent is present in $ARGUMENTS | Agent tool directly — no whip, no IRC, no lifecycle |

Strict rules:

No --agent in arguments → tracked mode. No exceptions, no inference.
--agent in arguments → inline mode. /whip-start, IRC, and lifecycle steps are all skipped.
--backend specification (e.g., user says "use codex") → implies tracked mode. Backend selection is a whip concept and is incompatible with --agent.
Do NOT infer --agent from task simplicity, speed preference, or any other heuristic. The flag must be explicitly present in the user's input.

If running inside an active whip workspace, use whip workspace view <workspace-name> to get the worktree path for reading code artifacts referenced in the scenario. In tracked mode, simulation tasks go in the global workspace (ephemeral — do not pollute the active workspace).

Workflow

1. Concretize

Read any files, git refs, or codebase artifacts referenced in the scenario, then transform it into concrete test cases:

| Field | Description | |-------|-------------| | Name | Short identifier (e.g., deprecated-move-1) | | Setup | Context the agent receives (file contents, code, instructions) | | Action | What the agent executes | | Output contract | Structured format the agent must produce |

The output contract is critical — all agents must produce the same structure so results are mechanically comparable:

### Result
- pattern: [short label for the approach taken]
- output:

[code block, JSON, or other structured output]

- decisions: [key judgment calls made]

For A/B comparisons, choose a strategy:

| Strategy | When to use | Agent count | |----------|-------------|-------------| | Sequential | Outputs are structured (code, configs) — one agent runs A then B | N | | Isolated | Outputs involve judgment or prose — separate agents per version | 2N |

Present the test plan including:

Test cases with output contracts
Execution mode (tracked or inline)
A/B strategy if applicable
Total agent count

Wait for user approval before executing.

2. Execute

Tracked mode (default)

Hand off dispatch to /whip-start. Prepare one task spec per simulation run and let /whip-start handle IRC, creation, assignment, and monitoring.

Each simulation run becomes one task:

Title: sim-{test-case}-{run}
Workspace: global
Difficulty: easy
Description: self-contained prompt (Role + Context + Task + Output Contract)

After all tasks complete, collect outputs and proceed to analysis.

Inline mode (`--agent`)

Spawn one Agent tool call per run, named sim-{test-case}-{run}.

Each prompt must be self-contained — embed all context inline, not file paths:

Role: "You are a simulation agent. Execute the task and produce structured output."
Context: All file contents and reference material inline
Task: The test case action
Output contract: The exact format to produce

Batching:

≤ 10 runs: spawn all at once with run_in_background: true
> 10 runs: groups of 10, next batch after previous completes

3. Analyze

Classify outputs into patterns:

Collect all agent outputs
Group by structural similarity — ignore cosmetic differences (whitespace, comment style, translation wording)
Label each group (A, B, C...)
Identify root cause of each divergent pattern
Flag agents with malformed output as "unclassifiable"

4. Report

## Simulation Report

### Consistency: X/N (Y%)

### Output Patterns
| Pattern | Count | Runs | Description |
|---------|-------|------|-------------|
| A       | 8     | #1-6,#8,#10 | [dominant behavior] |
| B       | 2     | #7,#9 | [variant behavior] |

### Divergence Analysis
For each non-dominant pattern:
- Runs: [list]
- Root cause: [why]
- Severity: cosmetic | functional | breaking
- Diff from dominant: [key differences]

### Summary
- Total: N runs across M test cases
- Dominant pattern: A (X%)
- Key findings: ...
- Recommendation: [if applicable]

Save the full report with raw agent outputs to /tmp/simulate-{slug}-{timestamp}.md and tell the user the path.

Rules

Never execute before user approves the test plan
Embed all context inline in prompts — no shared state assumptions
For A/B comparisons, both versions receive identical inputs
Use real file contents from the codebase — never fabricate code
In tracked mode, use global workspace and delegate dispatch to /whip-start
In tracked mode, clean up simulation tasks after collecting results: whip task clean
In inline mode, each run is single-shot — no follow-up messages or shared state

bang9/whip-simulate

whip/skills/whip-simulate/SKILL.md

Run multi-agent simulations to measure consistency of non-deterministic behavior. Use when the user wants to A/B test, validate behavioral equivalence, or stress-test outputs at scale.

10 stars

testing

Updated Apr 24, 2026

$ install --global

skillsauth

npx skillsauth add bang9/ai-tools whip-simulate

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 24, 2026, 8:33 PM0.5s1 file scanned

SKILL.md

name:: whip-simulate
description:: Run multi-agent simulations to measure consistency of non-deterministic behavior. Use when the user wants to A/B test, validate behavioral equivalence, or stress-test outputs at scale.
argument-hint:: <scenario> [--runs N] [--agent]
user_invocable:: true

Run multi-agent simulations from a user-provided scenario. Concretize the scenario into test cases, spawn agents, and analyze output patterns for consistency.

Input

Extract from $ARGUMENTS:

Scenario: what to simulate, compare, or verify
--runs N: number of simulation runs (default: 5)
--agent: use inline mode (see Execution Mode below)

Execution Mode (mutually exclusive)

$ARGUMENTS determines which dispatch mode this skill uses. The two modes are mutually exclusive:

Strict rules:

No --agent in arguments → tracked mode. No exceptions, no inference.
--agent in arguments → inline mode. /whip-start, IRC, and lifecycle steps are all skipped.
--backend specification (e.g., user says "use codex") → implies tracked mode. Backend selection is a whip concept and is incompatible with --agent.
Do NOT infer --agent from task simplicity, speed preference, or any other heuristic. The flag must be explicitly present in the user's input.

Workflow

1. Concretize

Read any files, git refs, or codebase artifacts referenced in the scenario, then transform it into concrete test cases:

The output contract is critical — all agents must produce the same structure so results are mechanically comparable:

### Result
- pattern: [short label for the approach taken]
- output:

[code block, JSON, or other structured output]

- decisions: [key judgment calls made]

For A/B comparisons, choose a strategy:

Present the test plan including:

Test cases with output contracts
Execution mode (tracked or inline)
A/B strategy if applicable
Total agent count

Wait for user approval before executing.

2. Execute

Tracked mode (default)

Hand off dispatch to /whip-start. Prepare one task spec per simulation run and let /whip-start handle IRC, creation, assignment, and monitoring.

Each simulation run becomes one task:

Title: sim-{test-case}-{run}
Workspace: global
Difficulty: easy
Description: self-contained prompt (Role + Context + Task + Output Contract)

After all tasks complete, collect outputs and proceed to analysis.

Inline mode (`--agent`)

Spawn one Agent tool call per run, named sim-{test-case}-{run}.

Each prompt must be self-contained — embed all context inline, not file paths:

Role: "You are a simulation agent. Execute the task and produce structured output."
Context: All file contents and reference material inline
Task: The test case action
Output contract: The exact format to produce

Batching:

≤ 10 runs: spawn all at once with run_in_background: true
> 10 runs: groups of 10, next batch after previous completes

3. Analyze

Classify outputs into patterns:

Collect all agent outputs
Group by structural similarity — ignore cosmetic differences (whitespace, comment style, translation wording)
Label each group (A, B, C...)
Identify root cause of each divergent pattern
Flag agents with malformed output as "unclassifiable"

4. Report

## Simulation Report

### Consistency: X/N (Y%)

### Output Patterns
| Pattern | Count | Runs | Description |
|---------|-------|------|-------------|
| A       | 8     | #1-6,#8,#10 | [dominant behavior] |
| B       | 2     | #7,#9 | [variant behavior] |

### Divergence Analysis
For each non-dominant pattern:
- Runs: [list]
- Root cause: [why]
- Severity: cosmetic | functional | breaking
- Diff from dominant: [key differences]

### Summary
- Total: N runs across M test cases
- Dominant pattern: A (X%)
- Key findings: ...
- Recommendation: [if applicable]

Save the full report with raw agent outputs to /tmp/simulate-{slug}-{timestamp}.md and tell the user the path.

Rules

Never execute before user approves the test plan
Embed all context inline in prompts — no shared state assumptions
For A/B comparisons, both versions receive identical inputs
Use real file contents from the codebase — never fabricate code
In tracked mode, use global workspace and delegate dispatch to /whip-start
In tracked mode, clean up simulation tasks after collecting results: whip task clean
In inline mode, each run is single-shot — no follow-up messages or shared state

Related Skills

bang9/whip-start

development

VerifiedTrustedCommunity

Spawn whip agent sessions to handle tasks. Dispatch a single agent or assemble a small team with explicit backend, scope, and ownership.

10SKILL.mdUpdated Apr 24, 2026

bang9/whip-pr-followup

development

VerifiedTrustedCommunity

Triage unresolved PR review threads via webform and dispatch fixes through whip-start. Use after receiving review feedback on your own PR.

10SKILL.mdUpdated Apr 24, 2026

bang9/whip-pr-followup

bang9/whip-plan

content-media

VerifiedTrustedCommunity

Analyze work, design a stacked task plan, and get user approval before execution. Use when starting a multi-task project that needs planning.

10SKILL.mdUpdated Apr 24, 2026

bang9/whip-lgtm

development

VerifiedTrustedCommunity

Iterative review-fix loop — dispatch a fresh codex reviewer each round, apply fixes, repeat until LGTM. Use when you want rigorous code quality validation before merge.

10SKILL.mdUpdated Apr 24, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/bang9/ai-tools.git

# Copy into Claude Code skills folder (global)
cp -r ai-tools/whip/skills/whip-simulate ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

bang9/ai-tools

10 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT

Adoption

bang9/whip-simulate

$ install --global

Security Scan Results

SKILL.md

Input

Execution Mode (mutually exclusive)

Workflow

1. Concretize

2. Execute

Tracked mode (default)

Inline mode (--agent)

3. Analyze

4. Report

Rules

Related Skills

bang9/whip-start

bang9/whip-pr-followup

bang9/whip-plan

bang9/whip-lgtm

bang9/whip-simulate

$ install --global

Security Scan Results

SKILL.md

Input

Execution Mode (mutually exclusive)

Workflow

1. Concretize

2. Execute

Tracked mode (default)

Inline mode (--agent)

3. Analyze

4. Report

Rules

Related Skills

bang9/whip-start

bang9/whip-pr-followup

bang9/whip-plan

bang9/whip-lgtm

Inline mode (`--agent`)

Inline mode (`--agent`)