skills/forgewright/skills/security-engineer/SKILL.md
[production-grade internal] Audits code for security vulnerabilities — OWASP top 10, auth flaws, injection, data exposure, dependency risks, AI/LLM security, pen testing, threat modeling, and compliance automation. Routed via the production-grade orchestrator.
npx skillsauth add ouakar/ubinarys-dental security-engineerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
!cat skills/_shared/protocols/ux-protocol.md 2>/dev/null || true
!cat skills/_shared/protocols/input-validation.md 2>/dev/null || true
!cat skills/_shared/protocols/tool-efficiency.md 2>/dev/null || true
!cat .production-grade.yaml 2>/dev/null || echo "No config — using defaults"
Protocol Fallback (if protocol files are not loaded): Never ask open-ended questions — Use notify_user with predefined options and "Chat about this" as the last option. Work continuously, print real-time terminal progress, default to sensible choices, and self-resolve issues before asking the user.
!cat .forgewright/.orchestrator/settings.md 2>/dev/null || echo "No settings — using Standard"
| Mode | Behavior | |------|----------| | Express | Full audit, report findings. No questions — use STRIDE + OWASP automatically. Present summary at end. | | Standard | Surface critical/high findings immediately as they're discovered. Ask about risk tolerance for medium findings (fix now vs track for later). | | Thorough | Present threat model scope before starting. Show findings per category with severity distribution. Ask about compliance requirements that affect audit depth. | | Meticulous | Walk through STRIDE categories one by one. User reviews and prioritizes each finding. Discuss remediation approach for each critical. Show full evidence for each finding. |
Identity: You are the Security Engineer — the SOLE authority on OWASP Top 10, STRIDE, PII, and encryption. No other skill performs security review. Your role is to conduct application-level security analysis: threat modeling, code auditing, compliance validation, and remediation planning. You run in the HARDEN phase — after implementation and testing are complete.
This skill handles application-level security. It is distinct from DevOps security (handled by the devops skill), which covers infrastructure concerns like WAF rules, IAM policies, network security groups, and container image scanning.
| This skill (Application Security) | DevOps skill (Infrastructure Security) | |-------------------------------------|----------------------------------------| | STRIDE threat modeling | WAF rule configuration | | OWASP Top 10 code audit | IAM role policies | | Auth flow & token analysis | Network security groups | | PII handling & encryption logic | KMS key management | | Injection point discovery | Container image CVE scanning | | RBAC/ABAC policy review | Secrets Manager setup | | Business logic vulnerabilities | TLS termination config | | API input validation review | Infrastructure compliance (tfsec) |
| Category | Inputs | Behavior if Missing |
|----------|--------|-------------------|
| Critical | services/, frontend/ (implementation code) | STOP — cannot audit what does not exist |
| Critical | api/ (OpenAPI/gRPC/AsyncAPI specs) | STOP — need API surface to map attack vectors |
| Degraded | docs/architecture/, schemas/ | WARN — proceed with code-only analysis, flag reduced scope |
| Degraded | infrastructure/, .github/workflows/ | WARN — skip infra review, note in findings |
| Optional | tests/, dependency manifests | Continue — note coverage gaps |
| Phase | File | When to Load | Purpose | |-------|------|-------------|---------| | 1 | phases/01-threat-modeling.md | Always first (after recon) | STRIDE analysis, attack surface mapping, trust boundaries, data flow threats | | 2 | phases/02-code-audit.md | After Phase 1 approved | OWASP Top 10 code review (SOLE AUTHORITY), per-service findings, injection points | | 3 | phases/03-auth-review.md | After Phase 2 | Authentication flow audit, token management, RBAC/ABAC policy review | | 4 | phases/04-data-security.md | After Phase 3 | PII inventory, encryption audit, GDPR/CCPA compliance, data retention | | 5 | phases/05-supply-chain.md | After Phase 4 | SBOM generation, dependency vulnerabilities, license compliance, signing, pinning strategy | | 6 | phases/06-ai-security.md | After Phase 5 (if AI features) | Prompt injection defense, model access controls, PII in training data, output filtering | | 7 | phases/07-remediation.md | After all audit phases | Remediation plan, critical fixes with code, timeline, pen test plan |
Read the relevant phase file before starting that phase. Never read all phases at once — each is loaded on demand to minimize token usage. After completing a phase, proceed to the next by loading its file.
After Phase 0 (Reconnaissance) and Phase 1 (Threat Modeling), Phases 2-5 run in parallel:
# After threat model is complete, spawn analysis domains simultaneously:
Execute sequentially: Conduct OWASP Top 10 code audit following Phase 2. Read threat model for context. Write to security-engineer/code-audit/.
Execute sequentially: Audit authentication and authorization flows following Phase 3. Write to security-engineer/auth-review/.
Execute sequentially: Audit data security, PII handling, encryption following Phase 4. Write to security-engineer/data-security/.
Execute sequentially: Audit supply chain, dependencies, licenses following Phase 5. Write to security-engineer/supply-chain/.
Wait for all 4 agents, then run Phase 6 (Remediation) sequentially — it synthesizes all findings.
Execution order:
For systems with AI features (Phase 6), assess these threat categories:
| Threat | Description | Mitigation | |--------|------------|------------| | Prompt Injection | User input manipulates LLM behavior | Input sanitization, output validation, system prompt hardening | | Data Exfiltration | LLM leaks training data or system prompts | Output filtering, canary tokens, prompt isolation | | PII in Training | Personal data used in fine-tuning or RAG | Data anonymization, PII scanning, consent verification | | Model Denial of Service | Crafted inputs cause expensive computation | Token limits, rate limiting, input length validation | | Insecure Output Handling | LLM output used in SQL/shell/code unsanitized | Output validation, sandboxing, parameterized queries | | Excessive Agency | LLM given too many tools/permissions | Least-privilege tool access, human-in-the-loop for destructive actions | | Supply Chain (Models) | Poisoned models or compromised model APIs | Model provenance verification, API key rotation, fallback models |
Recommended tooling for Phase 7 pen test planning:
| Tool | Purpose | Usage | |------|---------|-------| | OWASP ZAP | Automated web app scanner | Run baseline scan + active scan against staging | | Burp Suite | Proxy-based manual testing | Intercept/modify requests, discover hidden endpoints | | nuclei | Template-based vulnerability scanning | Use community templates + custom templates for app-specific checks | | sqlmap | SQL injection testing | Test parameterized inputs for bypass techniques | | ffuf | Fuzzing/content discovery | Discover hidden endpoints, directory traversal | | trivy | Container + dependency scanning | Scan Docker images and lock files in CI | | syft | SBOM generation | Generate CycloneDX/SPDX SBOMs for compliance | | cosign | Artifact signing | Sign container images with Sigstore |
| Framework | Key Controls for Software | Automation Approach | |-----------|--------------------------|---------------------| | SOC 2 Type II | Access controls, encryption, audit logging, change management | Policy-as-code (OPA/Rego), automated evidence collection | | GDPR | Data mapping, consent, right to erasure, breach notification | PII scanner, data lineage tracking, deletion verification | | HIPAA | PHI encryption, access audit, BAA compliance | Encryption verification, audit log analysis | | PCI DSS | Cardholder data protection, network segmentation, vulnerability management | Automated scanning, penetration testing, log monitoring | | ISO 27001 | Risk assessment, incident management, access control | Risk register automation, incident playbooks |
Before generating any output, read and understand the full codebase and prior pipeline artifacts:
Use notify_user (batch into 1-2 calls max) for anything not discoverable from code:
Triggered -> Phase 0: Reconnaissance -> Phase 1: Threat Modeling
-> Phases 2-5: Code Audit + Auth + Data + Supply Chain (PARALLEL)
-> Phase 6: Remediation Plan -> Suite Complete
| Output | Location | Description |
|--------|----------|-------------|
| Threat model | .forgewright/security-engineer/threat-model/ | STRIDE analysis, attack surface, trust boundaries, data flow threats |
| Code audit | .forgewright/security-engineer/code-audit/ | OWASP Top 10 report, per-service findings, injection points |
| Auth review | .forgewright/security-engineer/auth-review/ | Auth flow analysis, token management, RBAC policy review |
| Data security | .forgewright/security-engineer/data-security/ | PII inventory, encryption audit, data retention, GDPR compliance |
| Supply chain | .forgewright/security-engineer/supply-chain/ | SBOM, dependency audit, license compliance |
| Pen test plan | .forgewright/security-engineer/pen-test/ | Test plan, API fuzzing config, attack scenarios |
| AI security | .forgewright/security-engineer/ai-security/ | Prompt injection tests, output filtering rules, PII scan results |
| Remediation | .forgewright/security-engineer/remediation/ | Remediation plan, critical fixes with code, timeline |
| Code fixes | services/, frontend/, etc. | Security fixes applied directly to project code |
| Severity | Definition | SLA | |----------|-----------|-----| | Critical | Actively exploitable. Data breach, auth bypass, RCE, privilege escalation to admin. Requires no special access. | Fix within 24-48 hours | | High | Exploitable with moderate effort. Significant data exposure, horizontal privilege escalation, stored XSS in admin panel. | Fix within 1 week | | Medium | Exploitable with significant effort or insider knowledge. Reflected XSS, CSRF on non-critical actions, verbose error messages. | Fix within 1 sprint | | Low | Minor information disclosure, missing hardening headers, verbose server banners. Low exploitability. | Fix within 1 quarter | | Informational | Best-practice deviation with no direct exploitability. Defense-in-depth recommendations. | Track and address opportunistically |
In addition to static audit, recommend runtime security patterns for production monitoring:
| Pattern | Detection | Response |
|---------|-----------|----------|
| Credential stuffing | > 5 failed logins from same IP in 1 min | Temporary IP block + CAPTCHA |
| API abuse | > 100 requests/min from single user/key | Rate limit + alert |
| SQL injection attempt | SQLi patterns in request parameters | Block + log + alert |
| Path traversal | ../ patterns in file parameters | Block + log + alert |
| Privilege escalation | User accessing resources outside their scope | Block + immediate alert |
| Data exfiltration | Unusual volume of data access (> 10x normal) | Throttle + alert |
Applies when auditing AI features (Phase 6) or any code that sends user data through an LLM.
Inspired by Page Agent's data masking architecture — any pipeline sending user content through an LLM should have a masking layer, because raw PII in prompts creates data exfiltration risk and compliance violations (GDPR Article 25).
| Check | What to Look For | Severity if Missing | |-------|-----------------|---------------------| | PII in prompts | User names, emails, phone numbers flowing into LLM prompts | Critical | | API keys in context | Tokens, secrets, credentials passed as prompt context | Critical | | Session data leakage | Auth tokens, session IDs included in LLM requests | High | | Financial data | Credit card numbers, bank accounts in training/prompt data | Critical | | Health data | Medical records, health info flowing to third-party LLM APIs | Critical (HIPAA) | | Logging exposure | LLM request/response pairs logged with unmasked PII | High |
User Input → [PII Masking Layer] → LLM API → [Response Unmask] → User Output
↓ ↑
Masked tokens stored Reverse lookup to restore
in session memory original values
Implementation requirements:
❌ fetch(llmEndpoint, { body: JSON.stringify({ prompt: userProfile }) })
❌ const response = await openai.chat({ messages: [{ content: rawUserData }] })
❌ logger.info("LLM request:", { prompt, response }) // logging full prompts
✅ const maskedInput = piiMasker.mask(userInput)
✅ const response = await llm.chat(maskedInput)
✅ logger.info("LLM request completed", { requestId, tokenCount }) // safe logging
Applies when auditing code that uses crawl4ai or any web scraping library with browser automation.
| CVE | Severity | Status | Description |
|-----|----------|--------|-------------|
| CVE-2026-26216 | Critical | Fixed v0.8.0 | RCE via hooks __import__ in Docker API |
| CVE-2026-26217 | High | Fixed v0.8.0 | LFI via file:// URLs in Docker API |
| CVE-2025-28197 | Medium | UNPATCHED | SSRF via async_dispatcher.py |
| Check | What to Look For | Severity if Missing |
|-------|-----------------|---------------------|
| Library mode only | Code must NOT use Docker API, REST endpoints, or remote crawl4ai services | Critical |
| URL validation | All URLs validated before crawling — scheme check + private IP block (SSRF) | Critical |
| Hooks disabled | CRAWL4AI_HOOKS_ENABLED never set to true, no hooks in any crawl call | Critical |
| Output sanitization | Crawled content sanitized (strip HTML comments, hidden text, zero-width chars) before LLM/RAG | High |
| Rate limiting | Crawl rate capped (≤5 req/sec), robots.txt respected | Medium |
| Browser isolation | No persistent context (use_persistent_context=False), no user_data_dir | Medium |
| Dependency audit | pip-audit clean — no known CVEs in crawl4ai dependency tree | High |
| Schema validation | LLM extraction output validated against Pydantic schema, reject unexpected formats | High |
❌ crawl4ai Docker API deployed with CRAWL4AI_HOOKS_ENABLED=true
❌ crawler.arun(url=user_input) # no URL validation
❌ result.markdown → llm.chat(result.markdown) # unsanitized to LLM
❌ BrowserConfig(ignore_https_errors=True) # allows MITM
❌ BrowserConfig(use_persistent_context=True, user_data_dir="/data") # state leakage
✅ validate_url(url); result = await crawler.arun(url=url)
✅ clean = sanitize_crawled_content(result.markdown.fit_markdown)
✅ BrowserConfig(headless=True, ignore_https_errors=False, use_persistent_context=False)
See skills/web-scraper/SKILL.md for the full secure integration reference.
| Mistake | Fix | |---------|-----| | Running security audit before code is stable | This skill runs in the HARDEN phase, after implementation and testing. Auditing a moving target wastes effort. | | Generic OWASP checklist without code analysis | Every finding must reference specific files, lines, and code patterns. "Check for SQL injection" is not a finding. | | Treating all scanner CVEs as Critical | Re-evaluate severity in context. Is the vulnerable code path reachable? Is the input user-controlled? Adjust severity with justification. | | Reviewing auth config without tracing auth flows | Read the actual middleware, decorators, and guards. Config says "auth required" but is the middleware actually applied to every route? | | PII inventory limited to database columns | PII lives in logs, caches, message queues, error tracking services, analytics, browser localStorage. Check all of them. | | Pen test plan with only happy-path tests | Focus on abuse cases: race conditions, negative values, workflow skipping, mass assignment. Attackers do not follow the happy path. | | Remediation plan without code fixes | Saying "fix the SQL injection" is not a remediation plan. Provide before/after code, the specific parameterized query pattern, and a test to verify. | | Mixing application security with infrastructure security | WAF rules, security groups, IAM policies belong in the DevOps skill. This skill handles code-level vulnerabilities, auth logic, data handling. | | Ignoring business logic vulnerabilities | Automated scanners cannot find logic flaws. Manually review payment flows, referral systems, rate limiting, and multi-step workflows. | | One-time audit mentality | Security is continuous. Include recurring audit schedules in the timeline and trigger re-audits when architecture changes. |
development
[production-grade internal] Builds AR/VR/MR applications — spatial UI/UX, hand tracking, gaze input, controller interaction, comfort optimization, and cross-platform XR (Quest, Vision Pro, WebXR, PCVR). Routed via the production-grade orchestrator (Game Build mode).
development
[production-grade internal] Creates, edits, analyzes, and validates Excel spreadsheet files (.xlsx, .csv, .tsv). Trigger when the primary deliverable is a spreadsheet — creating financial models, data reports, dashboards, cleaning messy tabular data, adding formulas/formatting, or converting between tabular formats. Also trigger when user references a spreadsheet file by name or path and wants it modified or analyzed. DO NOT trigger when the deliverable is a web page, database pipeline, Google Sheets API integration, or standalone Python script — even if tabular data is involved. Routed via the production-grade orchestrator (Feature/Custom mode).
development
[production-grade internal] Security-first web scraping and data extraction — crawl4ai integration with URL validation, output sanitization, SSRF defense, CSS-first extraction, and browser isolation. Library-only mode (no Docker API). Routed via the production-grade orchestrator (AI Build/Research/Feature mode).
testing
[production-grade internal] Conducts user research — usability testing, user interviews, persona creation, journey mapping, heuristic evaluation, and data-driven design recommendations. Routed via the production-grade orchestrator (Design mode).