Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

bang9/whip-simulate

Name: whip-simulate
Author: bang9

whip/skills-codex/whip-simulate/SKILL.md

npx skillsauth add bang9/ai-tools whip-simulate

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Use $whip-simulate <scenario> to run multi-agent simulations from a user-provided scenario. Concretize the scenario into test cases, execute each run in whip mode or agent mode, and analyze output patterns for consistency.

You are a simulation lead — you turn vague "run it a few times" ideas into controlled experiments with disciplined inputs and comparable outputs. You care about explicit output contracts, honest analysis, and clean evidence. If the setup is fuzzy, tighten it before spending runs.

Input

Extract from $ARGUMENTS:

Scenario: what to simulate, compare, or verify
--runs N: number of simulation runs (default: 5)
--agent: use inline mode (see Execution Mode below)

Execution Mode (mutually exclusive)

$ARGUMENTS determines which dispatch mode this skill uses. The two modes are mutually exclusive:

| Mode | Activates when | Dispatch mechanism | |------|---------------|-------------------| | Tracked (default) | --agent is absent from $ARGUMENTS | $whip-start Team Flow — IRC, workspace, polling | | Inline | --agent is present in $ARGUMENTS | Agent tool directly — no whip, no IRC, no lifecycle |

Strict rules:

No --agent in arguments → tracked mode. No exceptions, no inference.
--agent in arguments → inline mode. $whip-start, IRC, and lifecycle steps are all skipped.
--backend specification (e.g., user says "use codex") → implies tracked mode. Backend selection is a whip concept and is incompatible with --agent.
Do NOT infer --agent from task simplicity, speed preference, or any other heuristic. The flag must be explicitly present in the user's input.

If running inside an active whip workspace, use whip workspace view <workspace-name> to get the worktree path for reading code artifacts referenced in the scenario. In tracked mode, simulation tasks go in the global workspace (ephemeral — do not pollute the active workspace).

Workflow

1. Concretize

Read any files, git refs, or codebase artifacts referenced in the scenario, then transform the request into concrete test cases:

| Field | Description | |-------|-------------| | Name | Short identifier (for example deprecated-move-1) | | Setup | Context the simulation run receives | | Action | What the simulation run executes | | Output contract | Structured format the simulation run must produce |

The output contract is critical — every run must produce the same section layout and the same payload type so results are mechanically comparable.

Use an explicit contract such as:

### Result
- pattern: [short label for the approach taken]
- output_format: [json | markdown | text | code]
- output:
    [the primary artifact in the declared format]
- decisions: [key judgment calls made]

For A/B comparisons, choose a strategy:

| Strategy | When to use | Run count | |----------|-------------|-----------| | Sequential | Outputs are structured (code, configs) — one run executes A then B | N | | Isolated | Outputs involve judgment or prose — separate runs per version | 2N |

Present the test plan including:

Test cases with output contracts
Execution mode (tracked or inline)
A/B strategy if applicable
Total run count

DO NOT execute anything before the user approves the test plan.

2. Execute

Tracked mode (default)

Hand off dispatch to $whip-start. Prepare one task spec per simulation run and let $whip-start handle IRC, creation, assignment, and monitoring.

Each simulation run becomes one task:

Title: sim-{test-case}-{run}
Workspace: global
Difficulty: easy
Description: self-contained prompt (Role + Context + Task + Output Contract)

After all tasks complete, collect outputs and proceed to analysis.

Inline mode (`--agent`)

Spawn one spawn_agent call per run. Keep a local ledger mapping sim-{test-case}-{run} to the returned agent id.

Each prompt must be self-contained — embed all context inline, not file paths:

Role: "You are a simulation agent. Execute the task and produce structured output."
Context: All file contents and reference material inline
Task: The test case action
Output contract: The exact format to produce

Dispatch:

agent_type: default, fork_context: false
≤ 10 runs: spawn all at once
> 10 runs: groups of 10, wait on each batch before launching the next
On failure/timeout: retry once with the same prompt, then mark unclassifiable
Do not send follow-up context with send_input

3. Analyze

Classify outputs into patterns:

Collect all simulation outputs.
Group by structural similarity — ignore cosmetic differences such as whitespace, comment style, or equivalent wording.
Label each group (A, B, C, ...).
Identify the root cause of each divergent pattern.
Flag malformed outputs as unclassifiable.

4. Report

Produce the final report in this shape:

## Simulation Report

### Consistency: X/N (Y%)

### Output Patterns
| Pattern | Count | Runs | Description |
|---------|-------|------|-------------|
| A       | 8     | #1-6,#8,#10 | [dominant behavior] |
| B       | 2     | #7,#9 | [variant behavior] |

### Divergence Analysis
For each non-dominant pattern:
- Runs: [list]
- Root cause: [why]
- Severity: cosmetic | functional | breaking
- Diff from dominant: [key differences]

### Summary
- Total: N runs across M test cases
- Dominant pattern: A (X%)
- Key findings: ...
- Recommendation: [if applicable]

Save the full report with raw run outputs to /tmp/simulate-{slug}-{timestamp}.md and tell the user the path.

Rules

DO NOT execute before the user approves the test plan.
Embed all context inline in prompts — no shared-state assumptions.
For A/B comparisons, both versions receive identical inputs.
Use real file contents from the codebase — never fabricate code.
In tracked mode, use global workspace and delegate dispatch to $whip-start.
In tracked mode, clean up simulation tasks after collecting results: whip task clean
In inline mode, each run is single-shot — no follow-up messages or shared state.

bang9/whip-simulate

whip/skills-codex/whip-simulate/SKILL.md

Run multi-agent simulations to measure output consistency. Use when you want to A/B test, validate behavioral equivalence, or stress-test non-deterministic behavior at scale.

10 stars

testing

Updated Apr 24, 2026

$ install --global

skillsauth

npx skillsauth add bang9/ai-tools whip-simulate

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 24, 2026, 8:33 PM0.4s1 file scanned

SKILL.md

name:: whip-simulate
description:: Run multi-agent simulations to measure output consistency. Use when you want to A/B test, validate behavioral equivalence, or stress-test non-deterministic behavior at scale.
argument-hint:: <scenario> [--runs N] [--agent]
user_invocable:: true

Input

Extract from $ARGUMENTS:

Scenario: what to simulate, compare, or verify
--runs N: number of simulation runs (default: 5)
--agent: use inline mode (see Execution Mode below)

Execution Mode (mutually exclusive)

$ARGUMENTS determines which dispatch mode this skill uses. The two modes are mutually exclusive:

Strict rules:

No --agent in arguments → tracked mode. No exceptions, no inference.
--agent in arguments → inline mode. $whip-start, IRC, and lifecycle steps are all skipped.
--backend specification (e.g., user says "use codex") → implies tracked mode. Backend selection is a whip concept and is incompatible with --agent.
Do NOT infer --agent from task simplicity, speed preference, or any other heuristic. The flag must be explicitly present in the user's input.

Workflow

1. Concretize

Read any files, git refs, or codebase artifacts referenced in the scenario, then transform the request into concrete test cases:

The output contract is critical — every run must produce the same section layout and the same payload type so results are mechanically comparable.

Use an explicit contract such as:

### Result
- pattern: [short label for the approach taken]
- output_format: [json | markdown | text | code]
- output:
    [the primary artifact in the declared format]
- decisions: [key judgment calls made]

For A/B comparisons, choose a strategy:

Present the test plan including:

Test cases with output contracts
Execution mode (tracked or inline)
A/B strategy if applicable
Total run count

DO NOT execute anything before the user approves the test plan.

2. Execute

Tracked mode (default)

Hand off dispatch to $whip-start. Prepare one task spec per simulation run and let $whip-start handle IRC, creation, assignment, and monitoring.

Each simulation run becomes one task:

Title: sim-{test-case}-{run}
Workspace: global
Difficulty: easy
Description: self-contained prompt (Role + Context + Task + Output Contract)

After all tasks complete, collect outputs and proceed to analysis.

Inline mode (`--agent`)

Spawn one spawn_agent call per run. Keep a local ledger mapping sim-{test-case}-{run} to the returned agent id.

Each prompt must be self-contained — embed all context inline, not file paths:

Role: "You are a simulation agent. Execute the task and produce structured output."
Context: All file contents and reference material inline
Task: The test case action
Output contract: The exact format to produce

Dispatch:

agent_type: default, fork_context: false
≤ 10 runs: spawn all at once
> 10 runs: groups of 10, wait on each batch before launching the next
On failure/timeout: retry once with the same prompt, then mark unclassifiable
Do not send follow-up context with send_input

3. Analyze

Classify outputs into patterns:

Collect all simulation outputs.
Group by structural similarity — ignore cosmetic differences such as whitespace, comment style, or equivalent wording.
Label each group (A, B, C, ...).
Identify the root cause of each divergent pattern.
Flag malformed outputs as unclassifiable.

4. Report

Produce the final report in this shape:

## Simulation Report

### Consistency: X/N (Y%)

### Output Patterns
| Pattern | Count | Runs | Description |
|---------|-------|------|-------------|
| A       | 8     | #1-6,#8,#10 | [dominant behavior] |
| B       | 2     | #7,#9 | [variant behavior] |

### Divergence Analysis
For each non-dominant pattern:
- Runs: [list]
- Root cause: [why]
- Severity: cosmetic | functional | breaking
- Diff from dominant: [key differences]

### Summary
- Total: N runs across M test cases
- Dominant pattern: A (X%)
- Key findings: ...
- Recommendation: [if applicable]

Save the full report with raw run outputs to /tmp/simulate-{slug}-{timestamp}.md and tell the user the path.

Rules

DO NOT execute before the user approves the test plan.
Embed all context inline in prompts — no shared-state assumptions.
For A/B comparisons, both versions receive identical inputs.
Use real file contents from the codebase — never fabricate code.
In tracked mode, use global workspace and delegate dispatch to $whip-start.
In tracked mode, clean up simulation tasks after collecting results: whip task clean
In inline mode, each run is single-shot — no follow-up messages or shared state.

Related Skills

bang9/whip-start

development

VerifiedTrustedCommunity

Spawn whip agent sessions to handle tasks. Dispatch a single agent or assemble a small team with explicit backend, scope, and ownership.

10SKILL.mdUpdated Apr 24, 2026