plugins/oh-my-secuaudit/skills/sec-cluster/SKILL.md
Dataflow-based code clustering for security assessments. Groups (Endpoint, Sink) paths by shared review strategy so reviewers sample representative cases instead of exhaustively reviewing every path. Use when scoping manual review on a codebase with 50+ endpoints, repetitive sanitization patterns, or after initial SAST/SCA produces large finding sets that need triage.
npx skillsauth add windshock/oh-my-secuaudit sec-clusterInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Dataflow-based code clustering for security assessments. Groups (Endpoint, Sink) paths by shared review strategy, enabling representative-sample review instead of exhaustive per-path analysis.
A cluster does not guarantee identical results. A cluster provides the possibility of applying the same review strategy.
Therefore the operating procedure is: verify clusters while using them, not trust them blindly.
sec-audit-static findings (optional, accelerates Phase 1)references/clustering_strategy_v4.md for the full strategy.templates/ for output format examples.Determine clustering applicability per the v4 strategy:
Include (requires dataflow analysis): | Check Item | Clustering Reason | |---|---| | XSS | Sink context (HTML, JS, attribute) determines vulnerability | | Data protection | Masking/encryption/exposure varies by flow | | SSRF / path traversal / template injection | Input propagation path analysis required | | Auth/authz (conditional) | Per-endpoint authorization application differs (v4 section 3.3) |
Exclude (static pattern matching sufficient):
Runtime.exec, ProcessBuilder)Auth/Authz Re-definition (v4 section 3.3):
Auth is not "does a common module exist" but "is it applied per-endpoint." Include in clustering when endpoint-level authorization varies.
templates/semgrep-rules/ to the target codebase:
metavariable-regex for domain-specific field namestemplates/sweep.sh (adapt module list):
./sweep.sh # all modules, human output
./sweep.sh --json # JSON output
./sweep.sh <module> # single module
For the auth/authz cluster (typically the largest):
@RequestMapping/@GetMapping/@PostMapping endpoints@PreAuthorize, @Secured, SecurityFilterChain presenceWebConfig/addInterceptors() auth interceptor registrationtemplates/auth_enum.sh as a starting point (adapt grep patterns).Define clusters using the (Endpoint, Sink) unit. For each cluster, document:
| Element | Description | |---|---| | Source | User input / external data entry point | | Transformation | Processing logic | | Validation/Sanitization | Filtering, encoding presence | | Sink | Final output point (DB, HTTP response, file, external call) | | Context | Auth state, data sensitivity, trust boundary |
Typical cluster categories:
Adjust cluster definitions to the target codebase. Not all categories apply to every project.
Per v4 section 7.5:
| Stage | Criteria | Sampling | |---|---|---| | Stage 1 (initial) | New cluster | 50%+ manual review, measure consistency | | Stage 2 (stabilization) | Consistency >= 80% | Reduce to 30% sampling | | Stage 3 (operational) | Miss rate < 5% for 2 consecutive cycles | Representative sample only | | Re-verification trigger | Major code change, new framework, missed vuln | Reset to Stage 1 |
For each sample, fill the review checklist (see templates/REVIEW_CHECKLIST.md.tmpl):
[X] (vulnerable), [N] (not vulnerable), or [partial]Produce these artifacts in the target's architecture-review/ or assessment output directory:
CLUSTERS.md — Full cluster inventory with:
semgrep-rules/ — Adapted rules with results/SUMMARY.md
semgrep-rules/results/REVIEW_CHECKLIST.md — Completed review with:
Clustering is ineffective when: | Condition | Reason | |---|---| | Reflection / dynamic dispatch | Static analysis cannot trace actual flow | | AOP / proxy-based flow | Runtime-determined security processing | | Framework internal hidden flow | Dataflow breaks inside framework | | Runtime config-dependent sanitizer | Same code, different behavior by config | | Template engine internal processing | Cannot trace internal escaping |
Fallback: Tag failed paths in Phase 1, manage separately, review manually prioritized by: external input proximity > auth bypass potential > rest.
| Artifact | Description | Consumed By |
|---|---|---|
| CLUSTERS.md | Cluster definitions, measurements, cross-references | sec-audit-static, security-architecture-review |
| semgrep-rules/*.yaml | Codebase-adapted detection rules | sec-audit-static re-runs |
| semgrep-rules/results/SUMMARY.md | Detection statistics | CLUSTERS.md, architecture review |
| semgrep-rules/results/REVIEW_CHECKLIST.md | Sample review verdicts and consistency | Architecture review, next audit cycle |
Provide:
| Metric | Definition | Formula | |---|---|---| | Intra-cluster consistency | Same-verdict rate within cluster | (matching samples) / (reviewed samples) | | Review efficiency | Time saved vs. exhaustive review | 1 - (clustered time) / (unclustered time) | | Sample miss rate | Vulnerabilities missed by representative sampling | (mismatched samples) / (additional samples) | | Reviewer agreement | Cross-reviewer verdict consistency | Cohen's Kappa or simple agreement rate |
references/clustering_strategy_v4.md — Full v4 strategy documenttemplates/semgrep-rules/ — Starter rule templates (5 categories)templates/sweep.sh — Module sweep runnertemplates/auth_enum.sh — Auth mechanism enumeration helpertemplates/CLUSTERS.md.tmpl — Cluster document templatetemplates/REVIEW_CHECKLIST.md.tmpl — Review checklist templatetesting
Query multi-source AppSec catalogs (CWE / OWASP Cheat Sheet Series / GitHub Advisory Database / AppSec.fyi) for a given security finding and propose a synthesis row to the security-field-notes synthesis-ledger via PR. Use when a producer skill (sec-audit-static, sec-audit-dast, external-software-analysis) emits a finding that needs external reference enrichment, or when packaging an assessment finding for downstream remediation context.
development
Transform security assessment deliverables from static documents (Word/Excel/portal) into version-controlled, executable projects. PoCs replace narrative claims; saved HTTP requests replace checkboxes; commit hashes enable exact-state reproduction. Use when scoping methodology for an audit, when an existing assessment needs to be made reproducible, or when assessment outputs must be inheritable across teams.
development
Security architecture review for codebases, producing Data Flow Diagram (DFD) with trust boundaries, Attack Flow overlay, scoped attack surface inventory, sensitive data map, and risk summary grounded in code. Use when asked to perform architecture-focused security review, reconstruct security design from code, or produce DFD/attack-flow documentation.
development
Static code security audit playbook (SAST, SCA, secret detection) with standardized JSON outputs and reporting. Use for source-code based assessments, schema validation, and generating final reports.