skills/siege/SKILL.md
Security audit of design docs, implementation plans, and code. Dispatches 6 parallel Opus agents across attacker perspectives, iterates until zero Critical + zero High findings, and maintains a persistent threat model. Triggers on 'siege', 'security audit', 'security review', 'threat model', or when audit detects security-relevant surfaces.
npx skillsauth add raddue/crucible siegeInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Full-lifecycle security audit. Dispatches 6 parallel Opus agents across distinct attacker perspectives, synthesizes findings, iterates until zero Critical + zero High, and maintains a persistent threat model that accumulates across sessions.
Announce at start: "Running Siege on [target name]. Commit anchor: [short SHA]."
Skill type: Rigid -- follow exactly, no shortcuts.
Model: All SECURITY ANALYSIS agents are Opus, no exceptions. Orchestrator, all 6 attacker-perspective agents, synthesis, and fix dispatch are Opus. Support functions (manifest scoping, stagnation judging, fix verification) may use Sonnet where the task is mechanical rather than analytical. If the session is not running Opus, refuse: "Siege requires Opus for all security analysis agents. Cannot proceed on a lesser model."
<!-- CANONICAL: shared/dispatch-convention.md -->All subagent dispatches use disk-mediated dispatch. See shared/dispatch-convention.md for the full protocol.
All subagent returns (the 6 attacker-perspective agents, synthesis, fix agents, stagnation judge) use the Ledger Return Protocol. Every subagent returns exactly one Evidence Receipt per shared/return-convention.md; the orchestrator applies the two-tier receipt linter (Tier 1 structural + Tier 2 witness verification — full grammar in the shared convention) to every Task return before acting on the declared VERDICT. The linter is a deterministic runtime tool: orchestrators MUST run python3 scripts/rcpt_verify.py --tier2 --strict --root <dispatch-root> --ledger <dispatch-root>/receipt-ledger.jsonl <receipt> on every received receipt before acting on its VERDICT, and apply the shared convention's in-context pseudocode ONLY as the fallback when the tool is unavailable (--strict hard-FAILs only resolvable paths; an unresolvable bare basename is UNVERIFIABLE, never a false FAIL). A lint failure is treated as structurally BLOCKED.
Siege-specific obligation: WITNESS for attacker agents is the attack one-liner (or pattern) that would succeed if the defender's claimed fix is incomplete. SKIPPED/UNRUNNABLE receipts (e.g. zap-cli unavailable) defer to Cairn for re-dispatch with tooling.
Tier 2 — Witness verification: PASS + TRACE#N → Read cited range (≤ 4 KiB); fail if witness would match expect-fail. FAIL + TRACE#N → reject only if no evidence of attack success is visible. SKIPPED/UNRUNNABLE → no read; record deferred obligation (e.g. zap-cli unavailable → orchestrator re-dispatches with tooling).
Per shared/cairn-convention.md. Siege-specific bindings:
round/1, round/2, …. A round spans the 6-parallel-attacker dispatch through synthesis + optional fix. Attacker dispatches are internal to the round (one LEDGER line per round aggregating them).round/N | dispatches=<6+synth+fix> receipts=<same> verdict=<PASS|FAIL|MIXED> | <critical-count, high-count, fix-target>. Terminal when zero Critical + zero High.active-run.md on terminal; keep the cairn.[ref: <attacker-receipt-prefix>]; the attack one-liner for any blocked attacker (UNRUNNABLE:tooling-absent) as an OPEN_OBLIGATIONS entry so the user's out-of-sandbox verification is tracked across rounds.Starting with convention v1.1, every siege subagent (6 attackers, synthesis, fix-agent, stagnation judge) returns a receipt carrying TRIPWIRE:, SUPERSEDES:, and (when applicable) TRIPWIRE-CHILD: lines. Full grammar in shared/return-convention.md.
Manifest: After each Task return (post-lint), append:
<rcpt-sha256-prefix-12> <skill>/<dispatch-id> <verdict> TRIPWIRE: <predicates> [SUPERSEDED_BY=<prefix>] [keys=siege:<k>:<v>,…] [files=<path>:<h6>,…]
Sweep: Same 7-step clause as /build (Tripwire Manifest Sweep section in skills/build/SKILL.md). Siege-specific notes:
TRIPWIRE: verdict=FAIL | wrote(<affected-files>) | claims-touch(<affected-files>). Any future dispatch that touches those files MUST re-Read the attacker's receipt before proceeding — so the live threat is not forgotten mid-run.VERDICT BLOCKED, ran=UNRUNNABLE:tooling-absent) SHOULD declare TRIPWIRE: always — the concern is live and unverified; every subsequent dispatch must reconsult.SUPERSEDES: <attacker-prefix> + CLAIM from=<attacker-prefix>#… + exec: WITNESS that re-runs the original attack one-liner. Tier-2 verifies the attack no longer succeeds; supersession only survives if the attack is genuinely closed.TRIPWIRE: peer-dispatch-disagrees(verdict) force a re-read when a later attacker reports PASS where this one reported FAIL (or vice versa) on the same target.Mandatory-work declarations for siege subagent types:
read-target, enumerate-surfaces, emit-findings. Optionally: run-attack (EXEC of attack one-liner) — if SKIPPED, WITNESS names the attack command and NEXT nominates it verbatim.read-attacker-findings, emit-consolidated-report.read-findings, apply-edits, run-tests (security regression).read-round-history, emit-verdict.On lint failure: structurally BLOCKED regardless of declared VERDICT. Re-dispatch with lint errors in brief, or escalate. Security-critical dispatches: if a fix agent's receipt fails lint, do NOT merge — the fix is unverified.
Audit finds bugs, robustness gaps, and architecture issues. Quality-gate iterates artifacts to convergence. Inquisitor hunts cross-component integration bugs. None of these operate from an attacker's perspective. Security is a discipline -- it requires threat modeling, attack surface enumeration, exploitation scenario analysis, and chain-of-vulnerability reasoning that generalist review skills are not equipped to perform. A robustness finding ("missing input validation") and a security finding ("this missing validation enables SQL injection via the /api/users endpoint, escalating to full database read") are categorically different in blast radius, urgency, and remediation strategy.
| Skill | Perspective | Scope | Output | Security Depth | |-------|-------------|-------|--------|----------------| | audit | Code quality reviewer | Existing subsystems | Findings report (no fixes) | Incidental: flags missing validation, not exploitation chains | | inquisitor | Integration tester | Cross-component diffs | Executable tests | None: tests functional correctness, not attacker behavior | | red-team | Devil's advocate | Single artifact | Written findings per round | Surface: flags "consider auth" without modeling attack paths | | quality-gate | Iterative reviewer | Any artifact | Converged artifact | None: quality convergence, not security convergence | | siege | 6 distinct attackers | Design docs, plans, AND code | Threat model + verified findings + accepted-risks log | Full: exploitation scenarios, blast radius, chain analysis, persistent threat model |
What Siege catches that others cannot:
Siege activates when the orchestrator (build, audit, or user session) encounters two or more of these high-risk signals in the target artifact:
A single signal is insufficient -- too many false positives. Two or more signals, or any signal combined with user confirmation, activates Siege at full force.
deployment_context (optional) — intranet | public | hybrid
Declares the deployment environment for context-aware severity adjustment.
| Context | Effect on Severity |
|---|---|
| intranet | Network-level findings downgraded by 1 level (Critical→High, High→Medium, etc.). Application-level findings unchanged. |
| public | No adjustment (default behavior) |
| hybrid | Same as public — no blanket adjustment. Use public when any endpoints face the internet. Intranet downgrade only applies when ALL endpoints are internal. |
| (unset) | Assumes public (worst-case, no change from v1) |
Network-level findings (eligible for intranet downgrade): anonymous endpoints, TLS/transport gaps, CORS misconfiguration, port exposure, security header gaps, rate limiting gaps.
Application-level findings (NEVER downgraded): injection, auth bypass, IDOR, data exposure, privilege escalation, business logic flaws, deserialization, SSRF.
Severity adjustments are applied during Phase 3 synthesis (not by agents). Every downgrade is logged: "Downgraded [ID] from [original] to [adjusted] due to intranet deployment context." Original severity preserved in finding metadata.
When deployment_context is not specified but exists in the persistent threat model, use the persisted value: "Using deployment context from threat model: {context}. Override with explicit parameter." Explicit parameter overrides persisted value and updates the threat model.
attack_mapping (optional) — boolean, default false
When true: Chain Analyst annotates chain steps with MITRE ATT&CK technique IDs. Other agents are unaffected. When false (default): no ATT&CK references anywhere. Behavior identical to v1.
--force -- Activate Siege regardless of heuristic (user explicitly wants security review)--skip -- Suppress Siege activation even when heuristic triggers (user explicitly declines)/siege for full security audit." This is a recommendation, not automatic invocation.Siege integrates with orchestrator skills via shared/security-signals.md, which codifies the 7-category activation heuristic in a consumption-optimized format:
security_review: required), siege is dispatched automatically. Critical/High findings block the pipeline identically to quality-gate Fatal/Significant.security_review: required|recommended to the contract YAML, which build consumes.When dispatched from build, siege receives:
mixed (design doc + implementation diff)deployment_context: from contract security_review.deployment_context if present, else defaults to publicBuild's escape hatches (--force-siege, --skip-siege) map to siege's --force and --skip flags respectively.
Siege scales its agent count to match the scope. Rigor is constant — every tier runs the full iterative gate and threat model update. What changes is agent count, because overlapping perspectives on a small target produce noise, not coverage.
| Scope | Agents | Rationale | |-------|--------|-----------| | Targeted change (<5 files, single concern) | 3: Boundary Attacker, Fresh Attacker, Chain Analyst | Surgical target — more perspectives restate the same findings. Boundary catches injection/input flaws, Fresh breaks epistemic closure, Chain catches cross-boundary paths. | | Single subsystem (5-19 files) | 4: Boundary Attacker, Insider Threat, Fresh Attacker, Chain Analyst | Focused target — 6 perspectives produce ~60% overlap. 4 agents capture the same findings with less noise. | | Multi-subsystem (20+ files) | 6: all agents | Broader target — distinct perspectives cover different services/components. |
Fresh Attacker and Chain Analyst always run regardless of scope. They are the differentiators — the Fresh Attacker breaks epistemic closure and the Chain Analyst finds multi-step exploits. The scope heuristic only affects whether domain-specific agents (Insider Threat, Infrastructure Prober, Betrayed Consumer) are included.
Scope override: User may force a specific tier with --agents 3|4|6. The default is auto-detected from the manifest.
At the start of every Siege run, record the current HEAD commit SHA:
git rev-parse HEAD and write the result to scratch/<run-id>/commit-anchor.mdgit rev-parse HEAD must equal the recorded SHA. This confirms no EXTERNAL changes occurred during analysis. If HEAD has moved, abort with "Codebase changed during Siege (anchor: [old], current: [new]). Re-run Siege on the current state." Do not attempt to diff and continue -- the threat model may be invalid.expected-head variable in the fix journal. After each fix commit, the orchestrator updates expected-head to the new commit SHA. Before each gate round dispatch, the orchestrator verifies git rev-parse HEAD matches expected-head. A mismatch means an EXTERNAL change occurred (someone else pushed, a hook fired, etc.) -- abort as above. Internal fix commits are expected and do not violate the anchor.The anchor protects against external changes to the branch, not internal fix commits made by Siege itself.
Before dispatching any agents, the orchestrator pre-fetches live intelligence. This runs once per Siege invocation, not per agent.
Calibration advisory (print-only, at entry). First, resolve scripts/brier_advisory.py by absolute path from the plugin root (same resolution as the ledger emit at terminal verdict) and run python3 <script> advisory siege. If it prints a line, surface that line verbatim before recon begins; if it prints nothing, say nothing. The script reads the central store (~/.claude/crucible/ledger/brier-rolling.json + falsification.jsonl, override CRUCIBLE_LEDGER_DIR) and is silent unless siege has ≥5 falsifiable verdicts with a Brier > 0.25 over trustworthy (≤30-day-old) reconciliation data. It honors CRUCIBLE_CALIBRATION_DISABLED=1 as a graceful skip. If the script can't be resolved, skip silently — a missing advisory must never block the gate. No behavior change; advisory print only.
What is fetched:
| Source | Method | Content | Fallback |
|--------|--------|---------|----------|
| OWASP Top 10 | Training data (supplemented by WebFetch of owasp.org/Top10 if available) | Current top 10 web application risks with CWE mappings | Training data knowledge only |
| OWASP Cheat Sheets | Training data (supplemented by WebFetch of relevant cheat sheet URLs if available) | Mitigation guidance for detected risk categories | Training data knowledge only |
| CISA KEV | WebFetch of cisa.gov/known-exploited-vulnerabilities-catalog (JSON feed) | Known exploited vulnerabilities in dependencies | Skip -- note in scope limitations |
| SANS CWE Top 25 | Training data | Most dangerous software weaknesses | Always available (training data) |
| Dependency scan | npm audit, pip audit, cargo audit, or language-equivalent CLI | Known CVEs in project dependencies | Note in scope limitations if no scanner available |
Budget: Intelligence summary is condensed to 50 lines maximum. This summary is prepended to every agent's dispatch prompt. It contains: (a) top 5 risks relevant to this codebase based on detected signals, (b) any CVEs found in dependencies with severity and affected package, (c) any CISA KEV matches. The orchestrator performs the relevance filtering -- agents receive only what applies to their target.
Fallback hierarchy: WebFetch available > training data only > note gap in scope limitations. Intelligence gathering must not block the run. If all WebFetch attempts fail, proceed with training-data knowledge and document the gap.
Determine the artifact type and build the target manifest.
Artifact type detection:
| Input | Classification | Agent Treatment |
|-------|---------------|-----------------|
| Design doc (.md with architecture, data flow, API surface) | design | Agents reason about threat model, trust boundaries, data flow risks. No code to examine. |
| Implementation plan (.md with task breakdown, file changes) | plan | Agents reason about attack surfaces the plan will create, missing security tasks, ordering risks. |
| Code (source files, diffs) | code | Agents examine implementation for vulnerabilities, injection points, auth bypasses. |
| Mixed (design + code, plan + code) | mixed | Agents receive both. Design/plan context as Tier 1, code as Tier 2. |
Manifest construction: Same pattern as audit Phase 1. Dispatch a Sonnet exploration agent to identify security-relevant files if the user did not specify a scope. Write manifest to scratch/<run-id>/manifest.md.
USER GATE: Present the manifest and intelligence summary to the user. "Siege scope: [N files]. Intelligence: [summary of findings]. Proceed?" User may adjust scope. Write scratch/<run-id>/gate-approved.md on confirmation.
Before agents are dispatched, enumerate the application's externally reachable endpoints via static pattern matching. This builds an "actually exposed" map independent of the file manifest, then cross-references the two to surface coverage gaps. Runs at orchestrator level (Sonnet) -- no agent dispatch.
Artifact-type guard: Step 2.5 requires source files to grep. Skip entirely for design and plan artifact types (no code to scan). Run only for code and mixed. Note in scope limitations: "Attack surface enumeration skipped -- artifact type [design|plan] has no source files to scan."
Sub-step A -- Framework Detection:
Scan manifest files and project configuration to detect which web framework(s) the project uses:
| Signal | Framework |
|--------|-----------|
| package.json with express dependency | Express.js |
| package.json with fastify dependency | Fastify |
| package.json with @nestjs/core dependency | NestJS |
| package.json with next dependency | Next.js |
| requirements.txt or pyproject.toml with flask | Flask |
| requirements.txt or pyproject.toml with fastapi | FastAPI |
| requirements.txt or pyproject.toml with django | Django |
| *.csproj with Microsoft.AspNetCore | ASP.NET Core |
| Gemfile with rails | Rails |
| pom.xml or build.gradle with spring-boot | Spring Boot |
| go.mod with gin-gonic/gin | Gin (Go) |
| go.mod with gorilla/mux | Gorilla Mux (Go) |
| Cargo.toml with actix-web | Actix Web (Rust) |
If no framework is detected, skip the rest of Step 2.5 and note in scope limitations: "No recognized web framework detected -- attack surface enumeration skipped." Multiple frameworks: enumerate all.
Sub-step B -- Route/Endpoint Enumeration:
For each detected framework, grep project files using these patterns to extract registered routes:
| Framework | Grep Pattern | Example Match |
|-----------|-------------|---------------|
| Express.js | (app\|router)\.(get\|post\|put\|patch\|delete\|all\|use)\s*\( | app.get('/api/users', ...) |
| Fastify | (fastify\|server)\.(get\|post\|put\|patch\|delete\|all)\s*\( | fastify.post('/login', ...) |
| NestJS | @(Get\|Post\|Put\|Patch\|Delete\|All)\s*\( | @Get('users/:id') |
| Next.js | Files under app/ or pages/api/ (convention-based routing) | app/api/users/route.ts |
| Flask | @(app\|blueprint)\.(route\|get\|post\|put\|delete)\s*\( | @app.route('/login', methods=['POST']) |
| FastAPI | @(app\|router)\.(get\|post\|put\|patch\|delete)\s*\( | @router.get('/items/{id}') |
| Django | path\(\s*['"] or url\(\s*['"] in urls.py files | path('api/users/', views.user_list) |
| ASP.NET Core | \[Http(Get\|Post\|Put\|Patch\|Delete)\] or \[Route\( or Map(Get\|Post\|Put\|Delete)\( | [HttpGet("api/users/{id}")] |
| Rails | (get\|post\|put\|patch\|delete\|resources\|resource)\s in config/routes.rb | resources :users |
| Spring Boot | @(GetMapping\|PostMapping\|PutMapping\|PatchMapping\|DeleteMapping\|RequestMapping)\s*\( | @GetMapping("/api/users") |
| Gin (Go) | (r\|router\|group)\.(GET\|POST\|PUT\|PATCH\|DELETE\|Any)\s*\( | r.GET("/api/users", ...) |
| Gorilla Mux (Go) | (r\|router)\.HandleFunc\s*\( | r.HandleFunc("/api/users", handler) |
| Actix Web (Rust) | \.(route\|resource)\s*\( or #\[(get\|post\|put\|patch\|delete)\] | #[get("/api/users")] |
For each match, extract: HTTP method (or "ANY" if indeterminate), route path (raw string), source file path, line number, framework name.
Auth-signal heuristic (best-effort): For each endpoint, scan surrounding context (same file, same route registration block) for auth middleware or decorator patterns: [Authorize], @RequireAuth, authenticate, isAuthenticated, requireLogin, @login_required, @permission_required, auth_guard, AuthGuard, before_action :authenticate. Classify each endpoint as auth: yes | no | unknown. This is approximate -- false negatives are expected (auth applied at router level may not appear near the route). The classification feeds the exposure map's "Auth" column and helps prioritize: auth: no endpoints are highest priority for Boundary Attacker partitioning.
Limitations (documented in exposure map):
app.use('/api', ...)) are recorded as "middleware mount", not individual endpointsresources expansion) produces approximate routesauth: unknown classificationsSub-step C -- Exposure Map and Cross-Reference:
Build the exposure map and cross-reference with manifest.md:
manifest.md with the tag [attack-surface-gap]. Log each addition: "Attack surface gap: added [file] to manifest ([N] endpoints not in original scope)." This is post-USER-GATE, so the user sees what changed before agent dispatch.scratch/<run-id>/exposure-map.md:# Attack Surface Exposure Map
**Framework(s):** [detected frameworks]
**Enumeration method:** Static pattern matching
**Endpoint count:** [N]
## Endpoints
| # | Method | Route | File | Line | Auth | In Manifest |
|---|--------|-------|------|------|------|-------------|
| 1 | GET | /api/users | src/controllers/UserController.ts | 42 | yes | Yes |
| 2 | DELETE | /admin/purge | src/admin/maintenance.ts | 88 | no | NO -- GAP |
## Coverage Gaps
- `/admin/purge` (src/admin/maintenance.ts:88) -- file not in Siege manifest
[list all gaps]
## Scope Limitations
[framework-specific limitations from sub-step B]
Line budget: The exposure map summary appended to Tier 1 context (Step 1 of Automated Context Assembly) is capped at 15 lines: endpoint count, gap count, and the gap list. The full endpoint table remains in scratch/<run-id>/exposure-map.md only.
Read ~/.claude/projects/<project-hash>/memory/security-audit/threat-model.md if it exists. Extract:
Pass relevant sections to agents as "prior threat context" (budget: 30 lines max, summarized by orchestrator). This enables drift detection -- agents can flag new surfaces not present in the prior model.
All 6 agents are dispatched in parallel using Task tool (general-purpose, model: opus). Fallback if parallel dispatch fails: sequential dispatch with user notification.
Calibration-weighted dispatch (advisory). Before fanning out the 6 attacker agents, resolve scripts/brier_advisory.py by absolute path from the plugin root — plugin_root="$(realpath "<this-skill-base-dir>/../..")" — and run python3 "$plugin_root/scripts/brier_advisory.py" advise siege <manifest files…> over the finalized manifest (post attack-surface-gap additions). If it prints a DispatchAdvice block, add it verbatim to the Tier 1 overview every agent receives, as scrutiny hints (NOT as findings, NOT scored). Best-effort: on empty output or any error, dispatch normally. See shared/calibration-weighted-dispatch.md.
Adopts audit's tiered context model with security-specific partitioning.
Tier 1 -- Overview (all agents receive):
Tier 2 -- Deep dive (per-agent partitioning):
Partitioning strategy by artifact type:
| Artifact | Tier 1 | Tier 2 |
|----------|--------|--------|
| design | Full design doc (up to 500 lines) | N/A -- no code. Agents reason from the doc. |
| plan | Full plan (up to 500 lines) | Referenced existing code that the plan modifies (partitioned by agent relevance) |
| code | Manifest + interfaces + dependency graph | Source files partitioned by security domain (auth files to Insider Threat, input handling to Boundary Attacker, etc.) |
| mixed | Design/plan as context | Code partitioned as above |
The orchestrator builds context programmatically. Manual file reading and pasting is an anti-pattern.
Step 1 — Build Tier 1 (once, shared):
scratch/<run-id>/manifest.mdscratch/<run-id>/intelligence-summary.mdscratch/<run-id>/exposure-map.md exists, extract exposure map summary: endpoint count, gap count, and gap file list (15-line budget). Append to the Tier 1 block so all agents know what is externally reachable.scratch/<run-id>/tier1-context.mdStep 2 — Build Tier 2 partitions (per-agent):
Calculate Tier 2 budget: 1500 - len(Tier 1) - len(prompt template) - len(intelligence) lines
Content-based handler detection (runs before pattern partitioning). For every manifest file whose filename matches *api*, *route*, *handler*, *controller*, *server*, *index*, or *config* but does NOT already match an external-trigger pattern (*webhook*, *trigger*, *cron*, *job*, *consumer*, *subscriber*, *scheduler*), run a content grep for any of the following signals (use rg -l with a combined alternation):
x-(hub|stripe|slack|twilio|shopify|github|gitlab|linear)-signature, signature-256, verifyWebhookSignature@(webhook|queueHandler|KafkaListener|SqsListener|SnsListener|RabbitListener|EventListener|PubsubListener|Subscribe), @(shared_task|app\.task|celery\.task|dramatiq\.actor|task)\b, \.consume\(, \.subscribe\(, channel\.consume, socket\.on\(cron\.schedule, @Cron\(, @Scheduled\(, YAML schedule:\s*" (for GitHub Actions / K8s CronJob)(app|router|route)\.(post|put|delete|patch)\(["'\''][^"'\'']*(webhook|event|hook|callback|notify)Tool assumption: rg (ripgrep) on PATH. Fallback when rg unavailable: skip content-based detection, log "rg unavailable — content-based handler detection skipped", and recommend --agents 6 to force Infrastructure Prober inclusion. Matching files are tagged [external-trigger-detected] in the manifest and treated by the heuristic in the next step as if they matched the *webhook* pattern. Budget: 15 s across all candidate files AND 500 ms per file; on per-file timeout, skip that file and log. If the total budget is exceeded, tag only the matches found so far and log "content-based detection budget exceeded — recommend --agents 6". Known gaps (not matched by filename or content): Lambda/Functions handlers specified only in deployment manifests (serverless.yml, template.yaml), gRPC subscription handlers, GraphQL subscriptions — document in scope limitations if the target depends on these.
For each agent, select files from the manifest using the security-domain mapping (below) and the content-based tags from step 2:
[attack-surface-gap] from Step 2.5 are highest priority -- these register externally-reachable endpoints but were not in the original manifest. Files tagged [external-trigger-detected] are added here for the DoS / injection check.[external-trigger-detected] are added here for the config-aspect check.Read selected files. If total lines exceed Tier 2 budget, include highest-priority files as full source and remainder as 2-3 line summaries
Write each partition to scratch/<run-id>/<agent>-partition.md
Step 3 — Assemble dispatch prompt (per-agent):
./siege-<agent>-prompt.md[PASTE: Intelligence...] → content from intelligence-summary.md, [PASTE: Subsystem Overview...] → content from tier1-context.md, [PASTE: Source Files...] → content from <agent>-partition.md. Include all substituted content in the dispatch file.File-type heuristic for domain mapping: When manifest files don't have obvious security-domain labels, use filename/path patterns: *auth*, *login*, *session*, *permission* → Insider Threat; *route*, *handler*, *controller*, *api* → Boundary Attacker (includes external-trigger DoS / injection analysis); *config*, *.env*, *docker*, *nginx*, *.yml → Infrastructure Prober; *webhook*, *trigger*, *cron*, *job*, *consumer*, *subscriber*, *scheduler* → Boundary Attacker (handler bodies: injection + DoS) AND Infrastructure Prober (config aspects: signature verification, rate limits, DLQ); *model*, *schema*, *log*, *serial* → Betrayed Consumer. Pattern resolution: a file is evaluated against every pattern; the final agent set is the de-duplicated union of all matches. Multi-match files are NOT assigned to any agent twice. Content-based fallback: for handler files with generic names (api.ts, routes.ts, server.js, Next.js app/api/.../route.ts), grep for webhook/trigger signatures (req.headers['x-hub-signature'], verifyWebhookSignature, @webhook, queue-consumer decorators) and promote matches to the external-trigger pattern set. If a codebase's handlers cannot be detected by filename or content, recommend --agents 6 scope override to force Infrastructure Prober inclusion.
Each agent receives a structured prompt template and outputs findings in the initial lightweight format (see Finding Format below).
Prompt: siege-boundary-attacker-prompt.md
Perspective: External attacker with no credentials. Sees only public-facing surfaces.
Hunts for: Injection (SQL, XSS, command, LDAP), input validation gaps, deserialization vulnerabilities, SSRF, path traversal, header injection, open redirects.
Receives (Tier 2): API endpoint handlers, input parsers, deserialization code, URL routing, file upload handlers.
Dispatch: Single agent.
Prompt: siege-insider-threat-prompt.md
Perspective: Authenticated user with legitimate but limited access. Seeks privilege escalation.
Hunts for: Broken access control (IDOR, horizontal/vertical escalation), missing authorization checks, parameter tampering, mass assignment, insecure direct object references, role confusion.
Receives (Tier 2): Auth middleware, RBAC logic, user-facing endpoints, data access layers, admin panels.
Dispatch: Single agent.
Prompt: siege-infrastructure-prober-prompt.md
Perspective: Attacker probing deployment and configuration.
Hunts for: Secrets in code/config, misconfigured CORS/CSP/headers, insecure defaults, debug endpoints left enabled, missing rate limiting, TLS misconfiguration, exposed stack traces, information leakage.
Receives (Tier 2): Configuration files, environment handling, middleware setup, logging configuration, deployment manifests, Docker/CI files.
Dispatch: Single agent.
Prompt: siege-betrayed-consumer-prompt.md
Perspective: Downstream system or user whose trust is violated by the target.
Hunts for: Data leakage (PII in logs, over-broad API responses, cache poisoning), broken privacy contracts, missing encryption at rest/transit, audit log gaps, session management flaws, insecure token storage.
Receives (Tier 2): Data models, serialization/response shaping, logging code, session management, cache layers, database queries.
Dispatch: Single agent.
Prompt: siege-fresh-attacker-prompt.md
Perspective: Attacker with zero prior knowledge. Sees the codebase for the first time with no context from other agents.
Purpose: Breaks epistemic closure. The other 4 role-based agents share Opus's training-data blind spots. The Fresh Attacker receives ONLY the Tier 1 overview and a random 40% sample of Tier 2 files (no security-domain partitioning). Its prompt explicitly instructs: "Ignore conventional vulnerability categories. What looks wrong, unusual, or exploitable to you?"
Receives (Tier 2): Random sample of manifest files, not security-partitioned. Selection is deterministic per run: seed = hash(run-id + manifest content hash). This ensures different samples when the manifest changes and different samples on re-runs even within the same timestamp (since run-id includes a unique component).
Dispatch: Single agent.
Prompt: siege-chain-analyst-prompt.md
Perspective: Strategic attacker chaining multiple small weaknesses into a high-impact exploit.
Purpose: Finds multi-step attack paths that no single-perspective agent would flag. Operates on the coverage map from other agents (NOT their raw findings -- same anti-anchoring principle as audit's blind-spots agent).
Input: Tier 1 overview + coverage map (built from agents 1-5's partition records) + Tier 2 source files at trust boundaries and cross-component interfaces.
Coverage map contents: The coverage map contains ONLY file-to-agent assignments (which files were sent to which agent) and examination status (examined/overflow-summary/not-examined). Agent assignments in the coverage map are anonymized: use 'Agent-1' through 'Agent-5' instead of perspective names. The Chain Analyst does not need to know WHICH perspective examined each file -- only which files were examined vs. not examined. This preserves anti-anchoring by preventing the Chain Analyst from inferring what attack vectors were already explored. It does NOT contain finding counts, severity, or any information about what agents found. This is deliberately different from audit's coverage map which includes finding counts per lens.
Trust boundary file selection: Trust boundary files for the Chain Analyst's Tier 2 come from the persistent THREAT MODEL (Phase 1 Step 3, Trust Boundaries section) and the MANIFEST (Phase 1 Step 2, specifically interfaces and API surface files). They are NEVER selected based on agent findings. If no prior threat model exists, trust boundary files are identified from the manifest by selecting: (a) files at module/service boundaries, (b) API route handlers and middleware, (c) authentication/authorization entry points, (d) data serialization/deserialization interfaces.
Input budget: Coverage map: 40 lines max. Remaining budget after Tier 1 + coverage map goes to source files at trust boundary crossings.
Hunts for: Authentication bypass chains, data flow paths that cross trust boundaries without re-validation, time-of-check/time-of-use windows exploitable by attackers, dependency chains where a compromised package enables lateral movement.
Anti-restatement rule: Every chain must pass the cross-boundary test — vulnerabilities A and B must be in different files or components, connected by a concrete mechanism. A single vulnerability's consequences described in multiple steps is not a chain. Chains that fail this test are rejected.
Dispatch: Runs AFTER agents 1-5 complete (needs their partition records for the coverage map). This is the only sequential dependency.
Write-on-complete: Each agent writes findings immediately to scratch/<run-id>/<agent>-findings.md on completion. Partition records written to scratch/<run-id>/<agent>-partition.md before dispatch.
At the start of Phase 2, check whether consensus_query MCP tool is available. If available, use consensus_query(mode: "review") for one of the 6 agent dispatches (the Chain Analyst, since it performs synthesis-level reasoning where model blind spots are costliest). This replaces the single-model Chain Analyst dispatch, not supplements it. If unavailable, all agents use standard single-model dispatch.
Output normalization: The orchestrator normalizes consensus output to the standard 5-line finding format before writing to chain-analyst-findings.md. Provenance metadata (which models contributed, confidence scores, agreement levels) is preserved as a comment block at the top of the findings file but is NOT passed to Phase 3 synthesis. Anti-anchoring: synthesis should not weight findings based on which model raised them or how many models agreed.
Orchestrator reads all 6 findings files from scratch/<run-id>/. Steel-man-then-kill: for each finding, the orchestrator first articulates the strongest case that the finding is a false positive, then attempts to refute that case with evidence from the codebase. Findings that survive steel-manning proceed; findings that do not are demoted to "Noted" (below Minor) with the steel-man reasoning documented.
Mechanical dedup (orchestrator, no agent dispatch):
The orchestrator runs dedup before steel-manning. This is deterministic and requires no LLM reasoning.
Step 1 — Parse: Read all <agent>-findings.md files from scratch/<run-id>/. Extract the <!-- dedup: file=[path] line=[start-end] cwe=[CWE-ID(,CWE-ID)*] agent=[agent_name(,agent_name)*] --> metadata from each finding into a structured list. Split the cwe and agent values on , to support multi-CWE clusters and multi-agent attribution.
Step 2 — Exact dedup: Group findings by (file, any-shared-cwe). Two findings group together when their cwe sets intersect (after splitting the comma list). Within each group, merge findings whose line ranges overlap (e.g., lines 10-25 and lines 15-30 overlap). Keep the finding with the highest severity as the primary; append other agent names as "also flagged by: [agents]". Write merged finding count to the report.
Step 3 — Fuzzy dedup (same root cause, different CWEs): Within the same file, findings that are either (a) at overlapping line ranges OR (b) within a proximity threshold of 10 lines OR (c) explicitly annotated as the same function/block by either agent, are likely the same root cause seen from different perspectives — across agents OR within a single agent's multi-category checks. Group these as a "cluster": present as a single finding, noting the multiple CWEs and perspectives. Unlike Step 2, Step 3 does NOT require CWE overlap — any pair of CWEs may cluster if the location criterion is met. Distance metric: nearest-edge distance = max(0, max(A.start, B.start) - min(A.end, B.end)). Clustering rule: pairwise against a cluster anchor (the first record in the cluster), NOT transitive — a finding joins the cluster only if it is within threshold of the anchor. This prevents unbounded chain growth (e.g., findings at lines 10, 19, 28 do NOT transitively cluster). Examples: (a) Boundary Attacker flags CWE-89 (SQL injection) on line 42, Betrayed Consumer flags CWE-200 (information exposure) on line 44 of the same function — within the 10-line proximity threshold, one root cause, not two findings. (b) Same Boundary Attacker run files CWE-776 (alias bomb) on line 100 from the parser-config check and CWE-400 (resource exhaustion) on line 102 from the external-trigger DoS check, both on the same YAML-parsing webhook handler — within proximity threshold, one cluster. Cluster emission: emit the merged finding with cwe=[CWE-A,CWE-B,...] as a comma-separated list inside the brackets so downstream parsers see every root-cause category. Severity = max(cluster); on severity ties, sort cluster members by (file, line-start, agent-name) ascending and keep the first. Agent attribution lists every contributing agent.
Step 4 — Write dedup summary: Write scratch/<run-id>/dedup-summary.md with: raw finding count, exact-dedup merges, fuzzy-dedup clusters, final deduplicated count. This is the set that enters steel-man-then-kill.
1b. Design-phase cross-reference: If this Siege run targets code that had a prior design-phase red-team or Siege run on the design doc, check the threat model's Historical Findings for matches. Tag findings that were already flagged at design time: "Previously flagged in design review — implementation did not address." This helps triage: design-flagged findings that persist into code are higher priority than net-new findings.
Chain detection: After dedup, scan for findings from different agents that touch the same data flow path or trust boundary. Flag as a chain ONLY when the orchestrator can articulate the specific multi-step exploitation scenario. Proximity alone is not chaining. Assign exploitability to each chain using weakest-link inheritance: if any step is Hardening, the chain is Hardening. A chain is Active only when every step is independently exploitable today.
Severity classification (no demotion without proof):
Exploitability tag (required on every finding): Every finding must be tagged with exactly one exploitability class. This is orthogonal to severity — a High/Active is urgent, a High/Hardening is important but not on fire.
The tag appears in both the 5-line initial format and the full report format. The final report groups findings by exploitability within each severity level (Active first, then Hardening).
Deployment context severity adjustment: When deployment_context: intranet, apply network-level downgrade after severity classification:
Downgraded [ID] from [original] to [adjusted] — intranet deployment context (network-level finding)Original-Severity: [severity]deployment_context is public, hybrid, or unset: no adjustmentWrite initial report to scratch/<run-id>/report.md using the initial lightweight finding format.
The security gate iterates until zero Critical + zero High findings remain, or stagnation is detected.
Adopts quality-gate's iterative pattern with security-specific scoring.
Scoring: Critical = 5, High = 2, Medium = 0 (medium findings are tracked but do not block the gate). Both Active and Hardening findings contribute equally to the gate score — exploitability affects triage priority in the report, not gate blocking. A Critical/Hardening finding ("one reasonable change from full compromise") is too dangerous to pass through.
Loop:
Present findings from Phase 3 (or latest round) to the fix agent
Fix agent remediates Critical and High findings (see Fix Mechanism below). One commit per finding — not a batch commit. Each commit message: fix(security): address [SIEGE-XX-N] — [title]
Fix approval output (non-blocking): After the fix round completes, output a per-fix commit table:
## Siege Fix — Round N
| # | Finding | Approach | Files | Commit |
|---|---------|----------|-------|--------|
| 1 | [SIEGE-BA-1] SQL injection | Parameterized query | UserController.cs | abc1234 |
| 2 | [SIEGE-IT-3] IDOR on /api/records | Added ownership check | RecordService.cs | def5678 |
**To reject a fix:** `git revert <commit-sha>` (each fix is a separate commit)
**To accept all:** No action needed — fixes are already applied.
The pipeline does NOT pause. The next review round proceeds immediately with the current code state (fixes applied). If the user rejects a fix between rounds, the next review will re-find the vulnerability — this is correct behavior.
Dispatch a FRESH review round: 2 agents only (Boundary Attacker + one rotating agent). The rotating agent is selected based on the security domain of files modified by the fix agent: auth/RBAC changes → Insider Threat, data/logging changes → Betrayed Consumer, config/infra changes → Infrastructure Prober, multi-domain or ambiguous → Chain Analyst. On every 3rd round, the Fresh Attacker replaces the rotating domain agent (keeping the count at 2 review agents per round). When the Fresh Attacker is included in a Phase 4 round, it receives the full fix diff plus a random sample of unchanged files from the manifest (same 40% sampling strategy as Phase 2, re-seeded for this round). This preserves its 'fresh eyes on broader context' value -- it reviews the fix AND looks for new issues the fix may have exposed in surrounding code. Full 6-agent re-dispatch is disproportionate for incremental fixes.
Score the new findings. Compare to prior round:
In addition to prior-round comparison, track the lowest score achieved in any prior round (the high-water mark). If the current round's score exceeds this minimum by 3 or more points, treat as regression regardless of the stagnation judge's domain-shift assessment.
Fix mechanism:
| Artifact Type | Fix Agent | Action |
|---|---|---|
| design | Plan Writer subagent (Opus) | Revises design to close the vulnerability |
| plan | Plan Writer subagent (Opus) | Adds security tasks, reorders to close gaps |
| code | Fix subagent (Opus, new instance) | Patches the vulnerability with verification criteria |
Design and plan fix agents write the revised artifact back to its original path (the design doc or plan file). Review agents read from the same path. Anti-anchoring is maintained because the revised doc contains no revision marks -- the fix agent produces a clean replacement.
Before dispatching the fix agent (code artifacts only): If crucible:checkpoint is available, create checkpoint with reason 'pre-siege-fix-round-N'.
After each fix agent commits, update expected-head.md with the new HEAD SHA (code artifacts only). For design and plan artifacts, Phase 4 integrity is maintained by the revised artifact at its original path rather than git HEAD tracking. The commit anchor check is skipped for non-code artifacts.
The fix agent writes expected-head.md as its final action before returning results to the orchestrator, not the orchestrator after receiving results. This closes the timing gap between commit and HEAD tracking. If compaction occurs during a fix round and expected-head.md is stale, recovery should compare HEAD against the most recent fix journal entry's 'Files changed' field -- if the diff matches a plausible fix commit, proceed rather than abort.
Fix verification: After each fix agent completes, dispatch a verification check using the finding's Verification Criteria (from the finding format). If the verification criteria specify a concrete check (e.g., grep for parameterized query, confirm endpoint requires auth token, verify header is set), run the check. Use Sonnet for verification -- this is mechanical confirmation, not analytical reasoning. Dispatch using verification criteria from the finding. The verifier checks mechanically: does the fix satisfy the stated verification condition? If the fix does not satisfy the verification criteria, flag the finding as "unresolved" in the fix journal with the failed verification output. Unresolved findings remain in the gate score for the next round.
Unresolved escalation: Critical-severity Unresolved: flag as binding with one-round grace. If the same Critical is Unresolved again (persistent verifier disagreement), the binding downgrades to informational -- Sonnet should not permanently override Opus. High-severity Unresolved: appended to fix journal as informational context.
Fix journal: Same as quality-gate's fix journal pattern. Maintained in scratch/<run-id>/fix-journal.md. Fix agents receive full journal on round 2+. Red-team reviewers NEVER receive the journal (anti-anchoring preserved).
// Fixed: SIEGE-001), strip them before the next review round.Stagnation detection: Same two-layer system as quality-gate. Orchestrator scoring first pass, then Sonnet stagnation judge for semantic comparison. Judge prompt includes security-specific context: "Is the fix addressing the root cause or just the symptom? Is the vulnerability being moved rather than eliminated?"
Siege uses its own stagnation judge prompt (./siege-stagnation-judge-prompt.md) with security-specific semantics. The three verdicts:
Oscillation detection: If weighted score increases between rounds, escalate immediately as regression. Include checkpoint reference from the prior round's pre-fix checkpoint.
Safety limit: 10 rounds (security fixes should converge faster than general quality; 10 rounds indicates a fundamental design issue).
Progress notification: After round 3 and every 2 rounds thereafter (rounds 3, 5, 7, 9).
Three exit modes beyond clean passage:
scratch/<run-id>/accepted-risks.md with timestamp, finding ID, severity, and user rationaleIf the fix agent or user identifies a finding as a false positive:
**[ID]** [severity] [Active|Hardening] -- [title]
File: [path]:[line_range] | Agent: [agent_name]
Attack: [1-sentence exploitation scenario]
Evidence: [specific code reference or design element]
Verification: [concrete test or check that confirms the vulnerability]
Agents output findings in this format only. No blast radius, no extended analysis. This keeps per-agent output under 30 lines for a typical 5-finding set.
For mechanical deduplication before steel-manning, each finding also includes structured metadata as a comment block:
<!-- dedup: file=[path] line=[start-end] cwe=[CWE-ID(,CWE-ID)*] agent=[agent_name(,agent_name)*] -->
The cwe and agent fields accept either a single value or a comma-separated list (no spaces inside the brackets). Individual-agent findings emit single values; orchestrator-merged clusters emit lists. The orchestrator uses these fields for first-pass mechanical dedup: same file + overlapping line range + any CWE overlap = merge. Downstream parsers MUST split the cwe value on , before matching. Steel-man-then-kill runs only on the deduplicated set, reducing synthesis cost.
Critical and High findings are expanded in the Phase 3 report:
### [ID]: [title]
**Severity:** [Critical|High] | **Exploitability:** [Active|Hardening] | **Agent:** [agent_name] | **Chain:** [yes/no]
**File:** [path]:[line_range]
**Exploitation Scenario:**
[2-4 sentences: who attacks, how, what they gain]
**Blast Radius:**
- Data exposure: [what data is at risk]
- User impact: [how many users, what they experience]
- System impact: [lateral movement, persistence, escalation potential]
**Verification Criteria:**
1. [Concrete test or reproduction step]
2. [Expected result that confirms the fix]
**Steel-Man (why this might not be exploitable):**
[1-2 sentences: strongest case for false positive, and why it was rejected]
Medium and Low findings remain in the 5-line initial format in the final report.
Written to scratch/<run-id>/report.md and presented to the user.
# Siege Security Audit Report
**Target:** [subsystem/artifact name]
**Commit Anchor:** [full SHA]
**Date:** [ISO-8601]
**Intelligence:** [sources consulted, gaps noted]
**Artifact Type:** [design|plan|code|mixed]
## Scope Limitations
[What Siege cannot detect -- see Known Limitations. Always present.]
## Attack Chains
[Multi-step chains identified by Chain Analyst, with full exploitation narrative. Chains are the highest-signal output — present them first so the reviewer sees composed threats before individual findings.
Chains inherit exploitability from their weakest link: if ANY step in the chain requires a future change to become exploitable, the entire chain is Hardening. A chain is Active only when every step is independently exploitable today.]
## Critical Findings
### Active Vulnerabilities
[Full report format for each, or "None"]
### Hardening
[Full report format for each, or "None"]
## High Findings
### Active Vulnerabilities
[Full report format for each, or "None"]
### Hardening
[Full report format for each, or "None"]
## Medium Findings
[Initial 5-line format for each, or "None". Medium and Low use the compact 5-line format which includes the exploitability tag per-finding. No Active/Hardening sub-grouping — these severities do not block the gate, so triage ordering is less critical.]
## Low Findings
[Initial 5-line format for each, or "None"]
## Accepted Risks
[Any findings the user acknowledged with rationale, or "None"]
## Threat Model Delta
[New surfaces, retired surfaces, drift from prior model]
## Agent Coverage
[Which agents examined which files -- partition summary]
Ledger emit at terminal verdict. When the security gate reaches its terminal verdict (Phase 4 — clean passage at zero Critical + zero High, or an escalation/stagnation exit), emit ONE JSONL line to the central ledger (~/.claude/crucible/ledger/runs.jsonl) via the emit CLI per the canonical protocol at skills/shared/ledger-append.md — resolve scripts/ledger_append.py by absolute path from the plugin root and run python3 <script> emit - '<json>'. Siege is a Tier A emitter: all calibration fields are populated.
emit CLI: it honors CRUCIBLE_CALIBRATION_DISABLED=1 as a graceful skip (L-6), dedups by (run_id, skill="siege") before appending (L-2), and auto-fills repo + schema_version. If the script can't be resolved, warn to stderr and skip — never block the gate. You construct the entry's calibration fields below.Field values for the siege entry:
skill: "siege", tier: "A".artifact_type: per the gated artifact — "code" for code (the default), else the artifact's type (design | plan | mixed maps to its dominant component; use other only when no schema value applies). Use the artifact-type detection from Phase 1 Step 2.verdict: map siege's terminal state to the schema enum — terminal clean (zero Critical + zero High) → PASS; user escalation / accepted-risk override / DIMINISHING_RETURNS escalation → ESCALATED; stagnation-judge STAGNATION exit → STAGNATION.severity_histogram: build by counting the final surviving findings — the set remaining at terminal verdict, as enumerated in the Output-Format final report (Critical / High / Medium / Low sections) — per their per-finding severities, using the siege→schema mapping Critical→fatal, High→significant, Medium→minor, Low→nit. The report's surviving-findings set is the source of truth; aggregate the severity counts across it. This makes the mechanical WHS rule (fatal + significant) >= 1 equivalent to "had ≥1 Critical or High" — exactly siege's gate criterion.would_have_shipped_without_gate: do NOT set by hand — it follows mechanically from the histogram per L-3 ((fatal + significant) >= 1). Emit the histogram and let the boolean follow.findings_count: total surviving findings across all severities. rounds: the Phase 4 gate round count. confidence: derive mechanically — 1.00 for a clean terminal PASS (zero Critical + zero High); for an escalation/stagnation exit, lower it toward 0.50 in proportion to unresolved Critical+High findings (e.g., 1.00 - min(0.50, 0.10 × unresolved_critical_high)). Siege has no separate scoring loop; this keeps the field reproducible across runs rather than free-chosen.artifact_hash: sha256 of the gated artifact bytes (the commit-anchor target). chunk_hash: null (siege does not chunk).gated_files: the manifest files reviewed (repo-relative). highest_finding: one-line quote of the most severe surviving finding, or null if none.predicted_falsifier: a pre-registered, machine-checkable predicate (Phase 7; canonical protocol in shared/ledger-append.md). Write a non-null predicate ONLY when verdict ∈ {PASS, FAIL} AND artifact_type == "code"; otherwise null (all escalation verdicts, all non-code artifact types). In one sentence, describe the future evidence that would prove this PASS wrong, using the canonical grammar where possible — for security findings prefer CVE referencing {component} within {N}d or postmortem referencing {component} within {N}d, else fix touching {file-or-glob[,...]} within {N}d (e.g., fix touching src/auth/session.ts within 60d). Free-form prose is allowed but counts as "unparseable" for auto-checking. Max 256 chars. Do NOT write the retired "<DEFERRED:pre-phase-7>" sentinel.backfilled: false, falsified: null, falsified_by: null, gated_files_truncated: 0, comment: null. run_id is a UUIDv7 (scripts/uuid7.py). timestamp is ISO-8601 UTC. (schema_version and repo are stamped by the emit CLI — do not hand-set them.)Path: ~/.claude/projects/<project-hash>/memory/security-audit/threat-model.md
Updated at the end of every Siege run (Phase 5). Accumulates across sessions.
Structure:
# Threat Model
**Last updated:** [ISO-8601]
**Last commit anchor:** [SHA]
**Deployment context:** [intranet|public|hybrid|unset]
## Trust Boundaries
- [boundary]: [description, files involved]
## Attack Surfaces
- [surface]: [exposure level, last reviewed date, source: exposure-map | manual]
<!-- Phase 5 updates this from exposure-map.md: new endpoints not in prior model are flagged "new attack surface"; endpoints in prior model but absent from current enumeration are flagged "retired surface" in the Threat Model Delta section of the report. -->
## Historical Findings
### [date] -- [target]
- [finding ID]: [severity] [Active|Hardening] [title] -- [status: resolved|accepted|open]
## Accepted Risks
- [finding ID]: [severity] [Active|Hardening] [title] -- Accepted [date] by [user]
Rationale: [user-provided rationale]
Next review: [leave blank until Sentinel ships; until then, re-evaluate on next Siege run]
Accepted risks contain attacker-relevant information (what is known to be vulnerable and why the team chose not to fix it). Access control:
memory/security-audit/ is added to .gitignore by the orchestrator on first Siege run (if not already present). The canonical path is under ~/.claude/projects/, which is outside the repo. The .gitignore entry is a defensive measure for non-standard setups where memory is configured inside the repo. The staging detection warning (point 3) is the primary protection.threat-model.md or accepted-risks.md in git staging: "Security data detected in git staging. Removing from staging area. These files must not be committed."Path: ~/.claude/projects/<project-hash>/memory/security-audit/preferences.md
## Siege Preferences
- Last scan: [ISO-8601]
- Default scope: [subsystem name or "full"]
- Agent count: [4 or 6, based on last scope size]
## Intelligence Source History
| Source | Last attempt | Status | Notes |
|--------|-------------|--------|-------|
| pnpm audit | 2026-03-30 | success | 2 CVEs found |
| CISA KEV | 2026-03-30 | failed | WebFetch timeout |
| OWASP cheat sheets | never | not attempted | |
On subsequent runs, retry failed sources. If a source fails 3 consecutive times, skip by default but note in scope limitations.
Between every agent dispatch and every agent completion, output a status update to the user. Identical to audit's communication requirement.
Every status update must include:
After compaction: Re-read the scratch directory and current state before continuing.
Write a status file to ~/.claude/projects/<project-hash>/memory/pipeline-status.md at every narration point. Same format as audit and build.
## Agents
- Boundary Attacker: DONE (3 findings)
- Insider Threat: DONE (1 finding)
- Infrastructure Prober: IN PROGRESS
- Betrayed Consumer: PENDING
- Fresh Attacker: PENDING
- Chain Analyst: WAITING (needs agents 1-5)
## Security Gate
- Round: 2 of 10
- Score: 12 -> 7 (progress)
- Critical: 1 -> 0
- High: 3 -> 2
Canonical path: ~/.claude/projects/<project-hash>/memory/security-audit/scratch/<run-id>/
The <run-id> is a timestamp generated at the start of Phase 1 (e.g., 2026-03-30T09-15-00).
Stale cleanup: At the start of each Siege run, delete scratch directories that are NOT referenced by an active-run-*.md marker AND are older than 4 hours. The marker check is the primary protection -- a 10-round gate at 15 min/round can exceed 2 hours, so the time window is a secondary safety net only.
Active run marker: Write ~/.claude/projects/<project-hash>/memory/security-audit/active-run-<run-id>.md at start. Delete on completion. After compaction, glob for active-run markers to locate the run.
Tool constraint: All scratch directory operations (create, read, list, delete) must use Write, Read, and Glob tools — NOT Bash. Safety hooks block Bash commands referencing .claude/ paths.
| File | Written When | Purpose |
|------|-------------|---------|
| commit-anchor.md | Phase 1 start | TOCTOU prevention |
| manifest.md | Phase 1 Step 2 | Scoped file list |
| exposure-map.md | Phase 1 Step 2.5 | Enumerated endpoints with manifest cross-reference |
| gate-approved.md | User confirms scope | Compaction recovery marker |
| intelligence-summary.md | Phase 1 Step 1 | Pre-fetched intelligence (50 lines) |
| <agent>-partition.md | Before each agent dispatch | Files sent as full source |
| <agent>-findings.md | On agent completion | Per-agent findings |
| coverage-map.md | Before Chain Analyst dispatch | Agent coverage for chain analysis |
| tier1-context.md | Phase 2 Step 1 | Shared Tier 1 context block |
| dedup-summary.md | Phase 3 Step 4 | Raw → deduplicated finding counts and merge log |
| report.md | Phase 3 | Synthesized findings |
| fix-journal.md | Phase 4, per fix round | Cumulative fix history |
| round-N-score.md | Phase 4, per round | Weighted score snapshot |
| round-N-findings.md | Phase 4, per round | Findings per gate round |
| round-N-comparison.md | Phase 4, when judge dispatched | Stagnation judge output |
| accepted-risks.md | Phase 4, on user override | Accepted findings with rationale |
| expected-head.md | Phase 4, after each fix commit | Current expected HEAD SHA after fix rounds |
| round-N-verification.md | Phase 4, after every fix round | Fix verification results per round |
After compaction:
active-run-*.md to locate scratch directorycommit-anchor.md. If round-N-score.md files exist (Phase 4 in progress), read expected-head.md and verify HEAD against that instead. If no Phase 4 files exist, verify HEAD against commit-anchor.md. Mismatch = abort.gate-approved.md -> re-present manifest<agent>-findings.md files -> count completed agents, dispatch remainingcoverage-map.md without chain-analyst-findings.md -> dispatch Chain Analystreport.md without round-1-score.md -> enter Phase 4round-N-score.md files -> resume gate at round N+1pipeline-status.md to recover Started timestamp and Recent EventsEmit a Compression State Block at each of the following points:
Block content:
## Compression State
- Goal: [current siege objective]
- Skill: siege
- Phase: [current phase and step]
- Health: [GREEN|YELLOW|RED]
- Key Decisions: [severity judgments from Phase 3, accepted risks, stagnation outcomes]
- Active Constraints: [commit anchor SHA, expected-head SHA, gate round, score trajectory]
- Next Steps: [immediate next action]
Recovery step 0: Before file-based recovery (Recovery Procedure step 1), read the Compression State from pipeline-status.md to re-establish context.
/tmp/crucible-siege-metrics-<run-id>.log -- agent dispatches, completion times, finding counts/tmp/crucible-siege-decisions-<run-id>.log -- scoping decisions, severity judgments, steel-man reasoningAfter the security gate passes (or user accepts risks and proceeds):
~/.claude/projects/<project-hash>/memory/security-audit/threat-model.md (or create if first run)scratch/<run-id>/exposure-map.md exists, merge its endpoints into Attack Surfaces: new endpoints not in the prior model are flagged "new attack surface"; endpoints present in the prior model but absent from the current enumeration are flagged "retired surface" in the Threat Model Delta.scratch/<run-id>/report.md to memory/security-audit/reports/<date>-<target>.md for persistent access. Scratch directories are cleaned up on subsequent runs; the report copy ensures findings survive cleanup.memory/security-audit/ is in .gitignoreSentinel is a planned companion skill — not yet implemented. When built, it will run on a weekly cron as a persistent watchdog that consumes Siege's threat model. It is NOT part of Siege's execution path. The design below documents the intended capability so the threat model artifact stays compatible; do not attempt to invoke /sentinel today.
npm audit / pip audit / cargo audit and compare against the threat model's last-known dependency state. Flag new CVEs.threat-model.md. Has the codebase changed in ways that make the risk more severe? Has a related CVE been published?Sentinel writes its findings to memory/security-audit/sentinel-report-<date>.md. If any probe finds Critical or High issues, it emits a terminal notification: "Sentinel detected [N] security issues since last Siege run. Run /siege to investigate."
Sentinel is designed for cron or scheduled task execution. It does not require user interaction. Configuration stored in memory/security-audit/sentinel-config.md:
## Sentinel Configuration
- Schedule: weekly (Sunday 02:00 UTC)
- Last run: [ISO-8601]
- Alert threshold: High (notify on High+, log Medium+)
./siege-boundary-attacker-prompt.md -- External attacker perspective dispatch./siege-insider-threat-prompt.md -- Authenticated user perspective dispatch./siege-infrastructure-prober-prompt.md -- Infrastructure and config perspective dispatch./siege-betrayed-consumer-prompt.md -- Downstream trust violation perspective dispatch./siege-fresh-attacker-prompt.md -- Zero-context cold-start dispatch./siege-chain-analyst-prompt.md -- Multi-step attack chain analysis dispatch./siege-stagnation-judge-prompt.md -- Security-specific stagnation judgment dispatchPrompt templates are created during skill implementation. Each template follows the pattern established by audit's lens prompts: perspective definition, what to hunt for, what NOT to hunt for, input sections (Tier 1 + Tier 2), output format (5-line initial finding format), and context self-monitoring at 50%+ utilization.
/siege), or recommended by crucible:audit when security surfaces are detectedcrucible:cartographer-skill (Mode 2: consult map) for subsystem boundaries and dependency graphsconsensus_query MCP tool for Chain Analyst dispatch (when available)~/.claude/projects/<project-hash>/memory/security-audit/threat-model.md (read at start, written at end)crucible:build -- Siege can run after build Phase 4 (implementation) when the activation heuristic triggerscrucible:audit -- audit may recommend Siege; Siege never invokes auditcrucible:quality-gate (Siege has its own security-specific gate), crucible:red-team (Siege's agents ARE the red team, specialized for security)When Siege findings are passed to the build pipeline or quality-gate, use this mapping:
| Siege Severity | Pipeline Severity | |----------------|-------------------| | Critical | Fatal | | High | Significant | | Medium | Minor | | Low | Minor |
Siege reports use Siege vocabulary internally. The mapping applies only when findings cross skill boundaries (e.g., Siege report consumed by build or quality-gate).
Siege effectiveness is measured by:
Agents must NOT:
The orchestrator must NOT:
threat-model.md or accepted-risks.md to the repositoryInclude this section verbatim in every Siege report under "Scope Limitations."
testing
Standalone instance-bug reviewer — runs a parallel finder fan-out + verify gate over a diff or a path and prints ranked, verified findings. Use when the user says "delve", "find bugs in this diff", "review this for bugs", "scan this file/subsystem for defects", "instance-bug sweep", or wants concrete reproducible defects (not a merge verdict, not systemic health). Works on a PR id, a base..head range, or a path, on any forge (GitHub, GitLab, Bitbucket, self-hosted).
testing
Render the Crucible calibration ledger weekly report — the honest "Crucible caught N silent bugs" headline, verdict breakdown, per-skill severity rates, and the inflation detector. Triggers on "/ledger", "weekly report", "weekly ledger", "caught N", "quality ledger", "calibration report", "render the ledger".
development
The Book of Grudges — cross-session bug graveyard. Every fixed bug is recorded as a structured "grudge"; before touching code, skills query the grudgebook for the files in scope and surface past regressions as forced "DO NOT REPEAT" context. Read mode (pre-flight) and write mode (on bug resolution / fix(*) PR). Machine-local, per-repo, never committed. Triggers on /grudge, "check grudges", "record a grudge", "any past bugs here", "regression oracle", "bug graveyard".
testing
Reconcile the Crucible calibration ledger — walk merged fix/hotfix branches to falsify the originating gating-verdicts, compute per-skill Brier calibration scores, and append a falsification log. Triggers on "/calibration-reconcile", "reconcile ledger", "reconcile calibration", "falsify verdicts", "brier score", "calibration reconcile", "compute brier".