.claude/skills/pentest-validation/SKILL.md
Use when validating security findings from SAST/DAST scans, proving exploitability of reported vulnerabilities, eliminating false positives, or running the 4-phase pentest pipeline (recon, analysis, validation, report).
npx skillsauth add proffesor-for-testing/agentic-qe pentest-validationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
<default_to_action> When validating security findings:
Quality Gates:
| Phase | Agent(s) | Purpose | Parallelism | |-------|----------|---------|-------------| | 1. Recon | qe-security-scanner | SAST, DAST, dependency scan, secrets | Internal parallel | | 2. Analysis | qe-security-reviewer + qe-security-auditor | Code review + compliance check | Both in parallel | | 3. Validation | qe-pentest-validator | Graduated exploit validation | Per-vuln-type parallel | | 4. Report | qe-quality-gate | "No Exploit, No Report" filter | Sequential |
| Tier | Handler | Cost | Latency | Use When | |------|---------|------|---------|----------| | 1 | Agent Booster (WASM) | $0 | <1ms | Code pattern is conclusive (eval, innerHTML, hardcoded creds) | | 2 | Haiku | $0.0002 | ~500ms | Need payload test against live target | | 3 | Sonnet/Opus | $0.003-$0.015 | 2-5s | Full exploit chain with data proof |
| Scenario | Tier | Estimated Cost | |----------|------|----------------| | PR security review (source only) | 1 | $0 | | Pre-release validation (staging) | 1-2 | $1-5 | | Full pentest validation | 1-3 | $5-15 | | Compliance audit evidence | 1-3 | $5-15 |
pentest:
target_url: https://staging.app.com # REQUIRED for Tier 2-3
source_repo: ./src # REQUIRED for Tier 1+
exploitation_tier: 2 # 1=pattern-only, 2=payload-test, 3=full-exploit
vuln_types: # Which pipelines to run
- injection # SQL, NoSQL, command injection
- xss # Reflected, stored, DOM XSS
- auth # Auth bypass, session, JWT
- ssrf # URL scheme abuse, metadata
max_cost_usd: 15 # Budget cap per run
timeout_minutes: 30 # Time cap per run
require_authorization: true # MUST confirm target ownership
no_production: true # Block production URLs
production_patterns: # URL patterns to block
- "*.prod.*"
- "api.*"
- "www.*"
Every pentest validation run MUST:
| Attack | Tier 1 (Pattern) | Tier 2 (Payload) | Tier 3 (Full) |
|--------|-------------------|-------------------|----------------|
| SQL injection | String concat in query | ' OR '1'='1 response diff | UNION SELECT data extraction |
| NoSQL injection | $where, $gt in query | Operator injection test | Collection enumeration |
| Command injection | exec(), system() calls | Command delimiter test | Reverse shell proof |
| LDAP injection | String concat in filter | Wildcard injection | Directory enumeration |
| Attack | Tier 1 (Pattern) | Tier 2 (Payload) | Tier 3 (Full) |
|--------|-------------------|-------------------|----------------|
| Reflected XSS | No output encoding | <img onerror> reflection | Browser JS execution via Playwright |
| Stored XSS | innerHTML assignment | Payload stored + retrieved | Cookie theft PoC |
| DOM XSS | document.write(location) | Fragment injection | DOM manipulation proof |
| Attack | Tier 1 (Pattern) | Tier 2 (Payload) | Tier 3 (Full) | |--------|-------------------|-------------------|----------------| | JWT none | No algorithm validation | Modified JWT accepted | Admin access with forged token | | Session fixation | No session rotation | Pre-set session reused | Cross-user session hijack | | Credential stuffing | No rate limiting | 100 attempts unblocked | Valid credential discovery | | IDOR | No authorization check | Access other user data | Full CRUD on foreign resources |
| Attack | Tier 1 (Pattern) | Tier 2 (Payload) | Tier 3 (Full) |
|--------|-------------------|-------------------|----------------|
| Internal URL | User-controlled URL fetch | http://169.254.169.254 | Cloud metadata extraction |
| DNS rebinding | URL validation bypass | Rebind to internal IP | Internal service access |
| Protocol smuggling | URL scheme not restricted | file:///etc/passwd | File content in response |
// Phase 1: Recon (parallel scans)
await Task("Security Scan", {
target: "./src",
layers: { sast: true, dast: true, dependencies: true, secrets: true }
}, "qe-security-scanner");
// Phase 2: Analysis (parallel review)
await Promise.all([
Task("Code Security Review", {
findings: phase1Results,
depth: "comprehensive"
}, "qe-security-reviewer"),
Task("Compliance Audit", {
findings: phase1Results,
frameworks: ["owasp-top-10"]
}, "qe-security-auditor")
]);
// Phase 3: Validation (graduated exploitation)
await Task("Exploit Validation", {
findings: [...phase1Results, ...phase2Results],
target_url: "https://staging.app.com",
exploitation_tier: 2,
vuln_types: ["injection", "xss", "auth", "ssrf"],
max_cost_usd: 15,
timeout_minutes: 30
}, "qe-pentest-validator");
// Phase 4: Report ("No Exploit, No Report" gate)
await Task("Security Quality Gate", {
findings: phase3Results.confirmedFindings,
gate: "no-exploit-no-report",
require_poc: true
}, "qe-quality-gate");
| Status | Meaning | Action |
|--------|---------|--------|
| confirmed-exploitable | Exploitation succeeded with PoC | Report with evidence |
| likely-exploitable | Partial exploitation, defenses detected | Report with caveats |
| not-exploitable | All exploitation attempts failed | Filter from report |
| inconclusive | WAF/defense blocked, unclear if vulnerable | Report for manual review |
aqe/pentest/
playbook/
exploit/{vuln_type}/{tech_stack}/{technique}
bypass/{defense_type}/{technique}
payload/{vuln_type}/{variant}
results/
validation-{timestamp}
poc/
{finding_id}-poc
| Scenario | Tier Mix | Findings | Est. Cost | Est. Time | |----------|----------|----------|-----------|-----------| | PR check (source only) | 100% Tier 1 | 5 | $0 | <5s | | Sprint validation | 70% T1, 30% T2 | 15 | $2-5 | 5-10 min | | Release validation | 40% T1, 40% T2, 20% T3 | 25 | $8-15 | 15-30 min | | Full pentest | 20% T1, 30% T2, 50% T3 | 40 | $15-30 | 30-60 min |
| Metric | Shannon | AQE Pentest Validation | |--------|---------|----------------------| | Cost per run | ~$50 | $5-15 (graduated tiers) | | Runtime | 60-90 min | 15-30 min (parallel pipelines) | | False positive rate | Low (exploit-proven) | Low (same principle) | | Learning | None (static prompts) | ReasoningBank playbook |
| Metric | Target | Measurement | |--------|--------|-------------| | False positive reduction | >60% of findings eliminated | Pre/post validator comparison | | Exploit confirmation rate | >80% of confirmed findings truly exploitable | Manual PoC verification | | Cost per run | <$15 USD | Token tracking per pipeline | | Time per run | <30 minutes | Execution time metrics | | Playbook growth | 100+ patterns after 6 months | Memory namespace count |
"No Exploit, No Report." A vulnerability scanner that can't prove exploitation delivers uncertain value. This skill transforms security findings from theoretical risks into proven vulnerabilities with evidence. Every confirmed finding comes with a reproducible proof-of-concept. Every false positive is eliminated before it reaches the report.
Think proof, not prediction. Don't report what MIGHT be vulnerable. Prove what IS vulnerable.
development
Apply XP practices including pair programming, ensemble programming, continuous integration, and sustainable pace. Use when implementing agile development practices, improving team collaboration, or adopting technical excellence practices.
development
Warehouse Management System testing patterns for inventory operations, pick/pack/ship workflows, wave management, EDI X12/EDIFACT compliance, RF/barcode scanning, and WMS-ERP integration. Use when testing WMS platforms (Blue Yonder, Manhattan, SAP EWM).
testing
Advanced visual regression testing with pixel-perfect comparison, AI-powered diff analysis, responsive design validation, and cross-browser visual consistency. Use when detecting UI regressions, validating designs, or ensuring visual consistency.
development
Comprehensive truth scoring, code quality verification, and automatic rollback system with 0.95 accuracy threshold for ensuring high-quality agent outputs and codebase reliability.