codex/skills/security-audit-for-saas/SKILL.md
Audit SaaS billing security: payment bypass, webhook integrity, auth gaps, RLS, secrets. Use when "security audit", "billing security", or pre-launch review.
npx skillsauth add tkersey/dotfiles security-audit-for-saasInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Core Insight: SaaS security failures cluster around billing bypass, auth gaps, and stale-cache entitlement errors -- not just OWASP generics. This skill operationalizes 130+ real vulnerabilities found, diagnosed, and fixed across production SaaS applications (jeffreys-skills.md, jeffreysprompts_premium, midas_edge, fix_my_documents, and others).
This is not a checklist. It is a cognitive toolkit. Checklists find the bugs you already know exist. The operators, creativity triggers, and attack-scenario catalogs below find the bugs that haven't been invented yet in your codebase.
Every audit must assume these axioms. They are load-bearing -- if you skip one, you will miss whole classes of vulnerabilities.
<!-- SECURITY_AUDIT_KERNEL_START v1.0 -->Axiom 1 — Every fail-open is a DoS pivot. If a dependency fails (Redis, Stripe, JWKS, subscription service), what does the code do? If it degrades to "allow," then degrading the dependency unlocks the gate. The attacker's first move is to break Redis, not to break auth.
Axiom 2 — Duplicate parsers diverge → smuggling. If two layers parse the same data (proxy vs route, middleware vs rate-limiter, URL validator vs fetch), their interpretations will drift over time. The smallest drift becomes a bypass gadget. Single-source-of-truth parsing is a security property.
Axiom 3 — Normalize before validate, always. Whitespace, unicode, URL encoding, trailing slashes, case-folding, symlinks. The canonical form IS the security boundary. Validation that operates on raw input is already bypassed.
Axiom 4 — Self-heal down, never up. Reconciliation loops that raise privilege (re-add admin flag, re-enable expired role, re-link subscription) undo revocations and create un-killable escalation paths. Drift should always decay toward lower privilege.
Axiom 5 — Every error is an oracle. Auth responses must be indistinguishable across account states. "Invalid password" vs "user not found" is a free email enumeration service. Timing differences between "user exists" and "user missing" is the same oracle by a different channel.
Axiom 6 — Presence-only header checks are worthless.
Trusting X-Admin: true, Authorization: anything, CF-Access-JWT-Assertion: anything, or X-Real-IP without cryptographic verification. Headers are
attacker-controlled at L7 unless you prove otherwise.
Axiom 7 — The recovery path is a shadow codebase. Every invariant on the primary path (signature verification, idempotency, authorization) must be re-enforced on the reconciliation cron, the migration runner, the webhook replay, and the database restore. Recovery paths are where audits go to die.
Axiom 8 — Attack surfaces expand faster than defenses. Webhooks, CLIs, import/export, batch jobs, OG-image endpoints, admin APIs, old API versions, debug endpoints, cron secrets -- each was added without a security review. Enumerate ALL surfaces before auditing any.
Axiom 9 — Prices, identities, and entitlements are server-side, period. Any value the client can modify is attacker-controllable. Prices, plan IDs, user IDs, org IDs, roles, feature flags -- all must be derived from server-authoritative state, not request bodies.
Axiom 10 — Multi-tenancy is RLS + belt-and-suspenders app-layer checks. RLS alone is insufficient (service role key bypasses it). App-layer checks alone are insufficient (devs forget). You need both, and you need to verify both at every boundary where data crosses tenant lines.
<!-- SECURITY_AUDIT_KERNEL_END -->These are composable mental moves that find vulnerabilities. They are not checklists. Apply them to any system, any domain, any stack. Each operator has triggers (when to use it), failure modes (when it misleads you), and a prompt module (copy-paste for sub-agents). See OPERATORS.md for the full card library (17 operators).
Definition: Protection verified on one surface (API) may not exist on another surface (webhook, cron, CLI, admin UI, batch import, old API version).
Triggers:
/api/v2/* -- does /api/v1/* still work?Failure modes:
Prompt module:
[OPERATOR: ⊘ Surface-Transpose]
1) List EVERY way to reach this feature/data: UI, API, webhook, CLI, CSV
import, cron, OG image, debug, admin, old API version, test-only.
2) For each, ask: "What authorization is enforced here?"
3) Find the weakest. That's the attack surface.
Definition: Every dependency (cache, DB, OAuth provider, rate limiter) has a failure mode. Find the code's response and check if it's fail-closed or fail-open.
Triggers:
try { ... } catch { return /* something */ }|| fallback on a permission or subscription checkredis.get(...) without a fallback definedFailure modes:
Prompt module:
[OPERATOR: ⟂ Fail-Open Probe]
For each external dependency (Redis, DB, Stripe, PayPal, JWKS, OAuth provider,
subscription service, cache):
1) Find the failure handler
2) Determine: fail-closed (deny) or fail-open (allow)?
3) For fail-opens: what's the attack? (e.g., DoS Redis → unlimited brute force)
4) Report CRITICAL for any fail-open on auth/billing/rate-limit paths
Definition: Make the implicit assumptions explicit. Each security mechanism depends on invariants. List them. Attack them individually.
Triggers:
Prompt module:
[OPERATOR: ≡ Invariant-Extract]
For this security mechanism, list every invariant it depends on:
- Network: is the service actually unreachable from that network?
- Clients: do they always send expected data?
- Storage: is the DB state always consistent with cache?
- Timing: are all operations strictly ordered?
For each invariant: "What if this were false? Who could make it false?"
Definition: Whenever two layers parse the same input, they will diverge. The smallest divergence becomes a bypass.
Triggers:
Prompt module:
[OPERATOR: ✂ Parser-Diverge]
Find every place where the same input is parsed by two layers:
- Proxy path matcher vs rate-limiter path matcher
- CORS origin validator vs CSRF origin validator
- Config-time URL validator vs runtime fetcher
- Frontend Zod schema vs backend Zod schema
Diff the two parsers. Any edge case they disagree on is a bypass.
Definition: Auth starts with a claim (JWT, custom_id, payer_id, email) and propagates to actions (webhook updates subscription for user X). Trace the chain; anywhere the claim can be substituted without re-verification is a hijacking vector.
Triggers:
Prompt module:
[OPERATOR: ⊙ Identity-Chain Trace]
For each webhook/auth flow:
1) What identity claim arrives? (custom_id, sub, email, payer_id)
2) What does the handler use it for? (look up user, update sub)
3) Is the claim cross-verified against other identity fields? (e.g., payer_id
matches stored customerId)
4) Is the claim attacker-controllable? (almost always yes for custom_id)
5) Can an attacker craft a claim pointing to a victim's userId?
Definition: Shift attacker persona. A "bored user" finds different bugs than a "financial fraudster" finds different bugs than a "nation-state" finds different bugs than a "disgruntled insider."
Triggers:
Prompt module:
[OPERATOR: ⊕ Creative-Transposition]
Re-audit the same code from 5 attacker personas:
1) Bored user — what edge cases break it for fun?
2) Financial fraudster — how do I get free service?
3) Competitor — how do I exfiltrate data or disrupt?
4) Disgruntled ex-employee — what backdoors do I know?
5) Nation-state — what long-term access do I plant?
Each persona sees different vulns. Use all 5.
See OPERATORS.md for 11 more operators including: Recovery-Path Walk (for webhook reconciliation), Normalize-First (for canonical form attacks), Timing-Oracle Hunt (for side channels), Self-Heal-Up Detector (for escalation via reconciliation), Tenant-Leak Probe (for multi-tenancy), Error-Oracle Test (for enumeration via errors), Recovery-Inconsistency (for migration/replay gaps), Mass-Assignment Probe (for DTO discipline), Expansion-Surface Hunt (for serialization attacks), Shadow-Codebase Scan (for deprecated paths), and Chain-Comp (composing operators into hunts).
| Need | Skill | |------|-------| | SaaS-specific security (billing, subscriptions, entitlements) | security-audit-for-saas (this) | | General codebase quality audit (any domain) | codebase-audit | | Find bugs and fix iteratively | multi-pass-bug-hunting | | Payment integration reference | stripe-checkout | | Database/RLS reference | supabase |
Run a security audit of this SaaS application.
Focus areas (in priority order):
1. Payment/billing bypass (can users get service without paying?)
2. Webhook integrity (signature verification, idempotency, race conditions)
3. Auth & authorization (RBAC gaps, TOCTOU in permission checks)
4. Entitlement enforcement (cache staleness, grace period logic)
5. Secrets exposure (env vars, client bundle, health endpoints)
6. Database access control (RLS coverage, missing policies)
7. Web security (redirects, CSRF, SSRF, XSS in user content)
For each finding: file:line, severity, attack vector, and fix.
Use the domain checklists from references/.
Audit payment security only: webhook handlers, checkout flows,
subscription status checks, and entitlement gating.
Top 5 issues, with severity and fix. Under 50 lines.
Work through domains in this order. Each domain has a detailed checklist in references/.
Key question: Can a user access paid features without paying?
| Check | What to grep | Why |
|-------|-------------|-----|
| Webhook signatures verified | constructEvent, verify-webhook-signature | Unsigned webhooks = forged events |
| Prices are server-side | amount, price, unit_amount in route handlers | Client-submitted prices = free service |
| Idempotency via DB constraint | ON CONFLICT, payment_events | Duplicate webhooks = double-credit |
| Checkout uses FOR UPDATE | FOR UPDATE, advisory in checkout routes | Race condition = double subscription |
| PayPal custom_id validated | custom_id, validatePayPalUserId | Attacker-controlled = subscription hijacking |
| Cache invalidated on sub change | cache event types, after() callbacks | Stale cache = entitlement mismatch |
| Seat count enforced | maxSeats, member addition flow | No check = unlimited free seats |
Key question: Can a user escalate privileges or bypass access controls?
timingSafeEqual?Key question: Does subscription status always reflect reality?
currentPeriodEnd (not stale/undefined)Key question: Are any secrets accessible outside their intended scope?
env.ts moduleNEXT_PUBLIC_* vars in client code.env files with real secrets in gitKey question: Can users read/write data they shouldn't?
using (true) for authenticated role on sensitive tablesKey question: Are standard web attack vectors mitigated?
startsWith("/") && !startsWith("//") everywheregetSafeRedirectPath() used at all redirect pointsrehype-sanitize allowlist (not blocklist)* endpoints (public, no credentials)Key question: Do operational patterns introduce vulnerabilities?
exec.Command with separate args (no sh -c concat)Key question: Can an attacker brute-force, enumerate, or cost-exhaust the system?
system source exempt (avoid banning Stripe/PayPal IPs on cert rotation)Key question: Can Tenant A see, modify, or infer Tenant B's data?
requireOrgRole() checks parallel RLS (belt-and-suspenders)Key question: Is every external boundary cryptographically verified?
constructEvent, raw body preservedverify-webhook-signature, all 5 headers, 10s timeoutgetClaims() not getSession() (JWT signature validation)Key question: Is sensitive data classified, protected, and deletable?
Key question: When (not if) a breach happens, can you detect, contain, and recover?
Key question: Can you answer "who did what, when, from where" for any action?
Key question: If the app uses LLMs, are prompt/tool/output attack vectors covered?
Key question: Does every API endpoint enforce validation, authorization, and shape discipline?
as string casts)| Level | SaaS-Specific Criteria | Example | |-------|------------------------|---------| | Critical | Direct revenue loss or data breach | Billing bypass, subscription hijacking, RLS missing on users table | | High | Exploitable with effort, auth escalation | TOCTOU in RBAC, missing rate limits on auth, secrets in git | | Medium | Limited scope, defense-in-depth gap | Overly permissive RLS, stale cache, missing security headers | | Low | Best practice, not directly exploitable | Idempotency key fallback, unsanitized log field, informational leak |
# SaaS Security Audit: [Project Name]
## Summary
- **Domains audited:** 7/7
- **Critical:** X | **High:** Y | **Medium:** Z | **Low:** W
- **Top risk:** [one-sentence summary of worst finding]
## Critical Findings
### [Title]
- **Location:** `file.ts:42`
- **Attack vector:** [How an attacker exploits this]
- **Impact:** [Revenue loss / data exposure / privilege escalation]
- **Fix:** [Specific code change]
## [High / Medium / Low sections, same format]
## Positive Findings
[Things done well -- helps prioritize effort]
| Don't | Do |
|-------|-----|
| Only check OWASP generics | Start with billing/payment bypass |
| Trust client-submitted prices or plan IDs | Verify all pricing is server-side |
| Assume as string is safe for Stripe objects | Use type guards: typeof x === 'string' ? x : x.id |
| Skip PayPal custom_id validation | Validate full identity chain: custom_id -> user -> payerId |
| Put auth checks outside transactions | Move permission checks inside FOR UPDATE transactions |
| Use === for secret comparison | Use crypto.timingSafeEqual() everywhere |
| Use in-memory rate limiters in serverless | Use Redis-backed rate limiting (Upstash, etc.) |
| Show key suffixes in health checks | Show prefix only, never suffix |
| Assume RLS migration covers all tables | Script-verify coverage against information_schema.tables |
When the 15-domain sweep comes back clean, the audit is not done. Most real vulnerabilities are invisible to checklists because they are compositions of "individually safe" components. Use these prompts to break out of checklist mode.
See CREATIVITY-TRIGGERS.md for the full catalog of 50+ prompts, including attacker personas, red team scenarios, and the "impossible question" technique (ask questions you believe have no answer, then let the search find the answer).
A real audit is not a linear checklist. It is a spiral: surface enumeration, threat modeling, domain sweep, creativity phase, validation, reporting.
┌──────────────────────────────────────────────────────────────────┐
│ PHASE 1: PREPARE (15 min) │
│ → Threat model (attacker goal, budget, access) │
│ → Surface enumeration (every entry point) │
│ → Trust boundary mapping (every data flow) │
└──────────────────────────────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────────────┐
│ PHASE 2: KERNEL CHECK (10 min) │
│ → Apply all 10 axioms to the system │
│ → Any axiom that doesn't hold = investigate immediately │
└──────────────────────────────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────────────┐
│ PHASE 3: DOMAIN SWEEP (60-90 min) │
│ → 15 domain checklists, billing first (highest revenue impact) │
│ → Parallel where possible (auth + data + web = independent) │
│ → Document findings as you go (not at the end) │
└──────────────────────────────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────────────┐
│ PHASE 4: OPERATOR CHAINS (30-60 min) │
│ → Apply ⊘ Surface-Transpose to each finding │
│ → Apply ⟂ Fail-Open Probe to each dependency │
│ → Apply ⊙ Identity-Chain Trace to each auth flow │
│ → Apply ≡ Invariant-Extract to each security mechanism │
└──────────────────────────────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────────────┐
│ PHASE 5: CREATIVITY PHASE (30 min) │
│ → Apply ⊕ Creative-Transposition (5 attacker personas) │
│ → Walk through creativity triggers for unexpected paths │
│ → Red team scenarios from references/ATTACK-SCENARIOS.md │
└──────────────────────────────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────────────┐
│ PHASE 6: VALIDATION (15 min) │
│ → For each finding: can you PROVE it's exploitable? │
│ → False positives discarded │
│ → Severity recalibrated against actual impact │
└──────────────────────────────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────────────┐
│ PHASE 7: REPORT (15 min) │
│ → Structured report with file:line, attack, impact, fix │
│ → Create beads for each finding (priority by severity) │
│ → If CRITICAL: recommend immediate pause + escalation │
└──────────────────────────────────────────────────────────────────┘
Total: ~4 hours for a comprehensive audit of a production SaaS.
This skill is one node in a network. It cross-references and is cross-referenced by other skills. Use together for compound effects.
| Workflow | Skills to Chain | |----------|----------------| | Pre-launch audit | this → multi-pass-bug-hunting → release-preparations | | Post-incident forensics | this (INCIDENT-RESPONSE.md) → codebase-archaeology | | Security test suite | this (SECURITY-TESTING.md) → testing-fuzzing → testing-metamorphic | | Triangulated security review | this → code-review-gemini-swarm-with-ntm → multi-model-triangulation | | Third-party integration audit | this (THIRD-PARTY.md) → stripe-checkout → supabase | | CI/CD security hardening | this (INFRASTRUCTURE.md) → gh-actions (SECURITY-CORE.md) | | Secure deployment | this → vercel (SECRETS.md) | | Vulnerability reporting | this → reporting-sensitive-encrypted-gh-issues |
Cross-skill references (don't duplicate, LINK):
| Topic | File | |-------|------| | Payment & billing deep checklist | BILLING.md | | Auth & authorization checklist | AUTH.md | | Entitlement enforcement patterns | ENTITLEMENTS.md | | Secrets management checklist | KEY-MANAGEMENT.md | | Database access control (RLS) | DATABASE.md | | Web security checklist | WEB.md | | Infrastructure security | INFRASTRUCTURE.md | | Rate limiting & abuse detection | RATE-LIMITING.md | | Multi-tenant isolation | MULTI-TENANT.md | | Third-party integration | THIRD-PARTY.md | | Data security & privacy (GDPR) | DATA-SECURITY.md | | Incident response & forensics | INCIDENT-RESPONSE.md | | Audit logging & compliance | AUDIT-LOGGING.md | | LLM/AI-specific security | LLM-SECURITY.md | | API security | API-SECURITY.md |
| Topic | File | |-------|------| | Triangulated security kernel | KERNEL.md | | Cognitive operators (17 cards) | OPERATORS.md | | Creativity triggers (50+ prompts) | CREATIVITY-TRIGGERS.md | | Attack scenarios & red team | ATTACK-SCENARIOS.md | | Threat modeling | THREAT-MODELING.md |
| Topic | File | |-------|------| | Real-world case studies (anonymized) | CASE-STUDIES.md | | Real code snippets (positive examples) | COOKBOOK.md | | Fail-open patterns catalog | FAIL-OPEN-PATTERNS.md | | Security testing strategy | SECURITY-TESTING.md | | Grep patterns for quick scanning | GREP-PATTERNS.md | | Observability for security | OBSERVABILITY.md | | Prompt archetypes (5 audit modes) | PROMPT-ARCHETYPES.md |
| Script | Purpose |
|--------|---------|
| scripts/rls-coverage.sql | Verify every table has RLS policies |
| scripts/leak-scan.sh | Find hardcoded secrets & env var leaks |
| scripts/webhook-signature-test.sh | Test webhook signature verification |
| scripts/api-auth-mapper.sh | Map every API route to its auth requirement |
| scripts/find-fail-open.sh | Grep for fail-open patterns in codebase |
| scripts/audit-quick.sh | 60-second automated surface audit |
| Topic | File | |-------|------| | OWASP SaaS Top 10 (breach-driven) | OWASP-SAAS-TOP-10.md | | MITRE ATT&CK mapping for SaaS | MITRE-ATTACK-MAPPING.md | | SOC 2 / GDPR / PCI / HIPAA / ISO 27001 | COMPLIANCE-DEEPDIVE.md | | 15 famous breach case studies | BREACH-CASE-STUDIES.md |
| Topic | File | |-------|------| | 9 threat modeling frameworks | ADVERSARIAL-THINKING.md | | Field stories from the trenches | FIELD-GUIDE.md | | Attack scenario genealogy (composing bugs) | ATTACK-SCENARIO-GENEALOGY.md | | Business logic flaws (40+ patterns) | BUSINESS-LOGIC-FLAWS.md | | Performance as security (DoS/DoW) | PERFORMANCE-DOS-VECTORS.md |
| Topic | File | |-------|------| | Idempotency patterns (distributed locking) | IDEMPOTENCY.md | | Timing-safe comparisons (3 variants) | TIMING-SAFE.md | | Crypto fundamentals (dos and don'ts) | CRYPTO-FUNDAMENTALS.md | | Session management lifecycle | SESSION-MANAGEMENT.md | | Admin impersonation (audit + cookies) | IMPERSONATION.md | | CLI auth via RFC 8628 device code | CLI-AUTH.md | | CSP patterns (per-path, per-env) | CSP-PATTERNS.md |
| Topic | File | |-------|------| | Zero trust for SaaS | ZERO-TRUST-SAAS.md | | Defense in depth (7 layers) | DEFENSE-IN-DEPTH.md | | Security maturity model (5 levels) | SECURITY-MATURITY.md | | Week-1 onboarding audit | ONBOARDING-AUDIT.md |
For parallel, focused work, dispatch these subagents (in subagents/):
| Subagent | Use When |
|----------|----------|
| billing-archaeologist | Tracing payment flows, finding divergence |
| rls-auditor | Auditing Supabase RLS coverage |
| entitlement-checker | Finding missing feature gates |
| recovery-path-walker | Auditing crons, migrations, replays |
| webhook-divergence-detective | Comparing provider handlers |
| admin-escalation-mapper | Mapping privilege escalation |
| red-team-agent | Creative vulnerability hunting |
| incident-responder | Active incident triage |
| Topic | File | |-------|------| | Honeypots, canaries, deception tech | HONEYPOTS-AND-DECEPTION.md | | Canary tokens & tripwires (detail) | CANARY-TOKENS.md | | WebAuthn / Passkeys / FIDO2 | WEBAUTHN-PASSKEYS.md | | Supply chain (SBOM, SLSA, Sigstore) | SUPPLY-CHAIN-DEEP.md | | Cloud provider security (AWS/GCP/Vercel) | CLOUD-PROVIDER-SECURITY.md | | Email security (SPF/DKIM/DMARC) | EMAIL-SECURITY.md | | DNS security (subdomain takeover, DNSSEC) | DNS-SECURITY.md | | GraphQL security (depth, complexity, field-auth) | GRAPHQL-SECURITY.md | | WebSocket security (CSWSH, per-msg authz) | WEBSOCKET-SECURITY.md | | SCIM provisioning security (enterprise) | SCIM-PROVISIONING.md | | AI/ML-specific security (beyond prompts) | AI-ML-SECURITY.md | | Red team playbook (kill chain for SaaS) | RED-TEAM-PLAYBOOK.md | | Bug bounty program design | BUG-BOUNTY.md | | Security debt quantification | SECURITY-DEBT.md | | 25 admin escalation paths | ADMIN-ESCALATION-PATHS.md |
| Topic | File | |-------|------| | Security economics (ALE, ROSI, budget defense) | SECURITY-ECONOMICS.md | | Data exfiltration defense (egress, staging, DLP) | DATA-EXFILTRATION-DEFENSE.md | | CORS deep-dive (10 real misconfig classes) | CORS-DEEP.md | | Modern browser security APIs (COOP/COEP/Trusted Types) | BROWSER-SECURITY-APIS.md | | Post-quantum crypto migration prep | POST-QUANTUM-PREP.md |
| Asset | Purpose |
|-------|---------|
| assets/AUDIT-REPORT-TEMPLATE.md | Final audit report format |
| assets/FINDING-TEMPLATE.md | Structured finding format |
| assets/THREAT-MODEL-CANVAS.md | 30-min threat model sketch |
| assets/OPERATOR-CARD-DECK.md | Printable operator cards |
tools
Convert markdown plans into beads with dependencies using br CLI. Use when creating task graphs, polishing beads before implementation, or bridging planning to agent swarm execution.
development
Orchestrate Codex skill optimization during active sessions through $cas goal control, $shadow single-session evidence, $tune diagnosis/refinement briefs, and the skill-optimizer custom subagent. Trigger for $opt, skill optimization loops, session-driven skill tuning, meta-skill audits, or explicit validated skill edits. Do not use for general code optimization, product optimization, or performance tuning.
development
Run a targeted fresh-eyes blunder pass over code, specs, plans, adjudications, closure gates, skill edits, or negative-evidence ledgers. Trigger when asked to reread with fresh eyes, find obvious bugs, catch mistakes/oversights/omissions, check for embarrassing misses, or perform a second independent blunder pass before closure. Do not use as a substitute for implementation, adjudication, or verification; use it as the final falsification/check pass for those workflows.
development
Explicitly shadow, tail, watch, follow, monitor, supervise, or companion exactly one Codex session id/path through `$seq`, then apply a named target skill as an interpretation/reporting/proposal/action lens until the watched session stops.