dot_claude/skills/pentest-parallel-prs/SKILL.md
Conduct an attacker-perspective security review of a codebase, then ship the fixes as multiple non-conflicting pull requests in parallel using subagents and git worktrees. Verifies findings to filter false positives, plans fixes via file-occupation analysis to guarantee no merge conflicts, dispatches each fix to its own TDD-driven subagent in an isolated worktree, then pushes and opens one PR per fix. Use this whenever the user asks for a security audit, pentest, vulnerability review, attack surface analysis, or wants to "find vulnerabilities" / "review for security issues" in their own (or otherwise authorized) project, even if they don't say the word "pentest". Also use when the user already has a list of independent fixes (security or otherwise) and wants them shipped as parallel non-conflicting PRs without manually managing worktrees.
npx skillsauth add paveg/dots pentest-parallel-prsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are running an authorized penetration test on a codebase, then shipping the fixes as a fan-out of independent, non-conflicting pull requests. The loop has two halves: find (parallel attacker-framed investigation) and fix (parallel TDD-driven implementation in isolated worktrees). The discipline that makes the whole thing work is generator-evaluator separation in the find half, and file-occupation analysis in the fix half.
externally_connectable to be exploitable, generic "this looks risky" with
no concrete vector.git log main..<branch>, renames
awkward branches, then pushes and opens PRs. This keeps a human checkpoint
before anything becomes public.git checkout main, rename, push, done.Goal: Make sure this is a defensive engagement.
$ARGUMENTS or the prior conversation. If the user
said "skip the extension" or "auth only", honor it. If no scope given,
default to the full project.If the request smells offensive (targeting third parties, evading detection, mass exploitation, supply-chain compromise), stop and refuse. The skill exists to harden software, not attack others.
Goal: Build a mental map of trust boundaries and entry points before dispatching anything.
Adapt domains to the actual codebase. A read-only static site has none of these; an LLM-backed chatbot has prompt injection and tool-use abuse.
Goal: Dispatch one focused subagent per domain, in parallel, with attacker framing.
Use the Agent tool with subagent_type: "Explore". Send all dispatches in a
single message so they run concurrently. Each subagent prompt MUST contain:
Example prompt sketch (for an SSRF domain on a Cloudflare Workers app):
You are a security researcher on an authorized pentest of <project>, a
<one-line description>. Focus ONLY on SSRF, URL-fetch attacks, and content
processing vulnerabilities. The owner has authorized this engagement.
Read these files:
- /abs/path/to/articles.ts
- /abs/path/to/url-utils.ts
- /abs/path/to/article-fetcher.ts
This app fetches user-supplied URLs server-side. Hunt concrete bypasses for:
1. SSRF to private IP ranges (10/8, 127/8, 169.254/16, ::1, fc00::/7,
IPv4-mapped IPv6 like ::ffff:127.0.0.1)
2. SSRF via redirect-following without re-validation
3. SSRF via DNS rebinding or obscure encodings (decimal, hex, 127.1)
4. Protocol smuggling (file:, gopher:, data:)
5. Response-size DoS, slowloris
6. Stored content rendering issues from parsed HTML
7. Prompt injection if fetched content reaches an LLM
For each finding: severity, file:line, exact exploit payload, why the current
code fails, fix. No generic advice. Report in <user's language>, under 1500
words. Skip any category where you find nothing actionable.
Tune the file list and the hunt list to the actual domain.
Goal: Filter false positives before treating anything as actionable.
After all subagents return, read the actual code for every CRITICAL and HIGH finding before believing it. This is not optional. Subagents make systematic errors that look authoritative. Examples seen in real runs:
eq(id) only, no userId" —
but the preceding SELECT scopes by (id, userId) and 404s if not found.
Check-then-act with UUIDs is safe today; the finding is at most a
defense-in-depth concern (MEDIUM), not CRITICAL._sender parameter is unused, web pages can hijack the
background script" — but chrome.runtime.onMessage only delivers from
same-extension contexts unless externally_connectable is set in the
manifest. Check the manifest first.When you spot a false positive, demote it (often to LOW or drop) and note it in your report. When you spot something the subagent missed while reading the code yourself, promote it. Both directions matter.
Output of this phase: a clean, severity-classified list of findings with PoCs, with clear notes on which subagent claims you rejected and why (transparency builds trust with the user).
Present this list to the user. Pause for direction before fixing — they may want to triage, defer some, or add scope.
Goal: Decide which fixes ship now, and how to parallelize them without merge conflicts.
Drop fixes you can't perform with current information. Examples:
Build an occupation table. For each remaining fix, list every file it
touches — source AND test files. Test files matter because two
parallel agents editing the same *.test.ts will conflict just as badly
as editing the same source.
| Task | Source files | Test files |
|---|---|---|
| C-1 | index.ts, auth-dev.ts | auth-dev.test.ts |
| C-2 | url-utils.ts, ai.ts, robots.ts | url-utils.test.ts, ai.test.ts, robots.test.ts |
| H-1 | rate-limit.ts | rate-limit.test.ts |
| M-3 | highlights.ts | highlights.test.ts |
Group tasks that share files. If two fixes both touch ai.ts, they
become one worktree task. Don't try to be clever with diff merging.
Verify zero overlap in the final grouping. If you can't get to zero overlap, run the conflicting groups sequentially (one worktree, then the next), not in parallel.
Show the table to the user and confirm before dispatching N worktrees. This is also a chance for them to drop fixes or change priority.
Goal: One subagent per non-overlapping group, each in its own worktree, each producing one local commit.
Use Agent with isolation: "worktree" and subagent_type: "general-purpose".
Send all dispatches in a single message for true parallelism.
Each subagent prompt MUST include:
pnpm test:api, pnpm typecheck,
pnpm lint). Don't assume the subagent knows.Worth knowing: isolation: "worktree" automatically creates a worktree under
.claude/worktrees/agent-<hash>/. The worktree persists if changes are made,
is auto-cleaned if not. Worktree paths and branch names come back in the
agent result. Don't try to manage worktrees manually for this phase — the
tool handles it.
Goal: Confirm each subagent left a clean, mainline-ready branch. Recover from any isolation hiccups before pushing.
git worktree list
Each subagent should appear with its worktree path and branch.for b in <branch1> <branch2> ...; do
echo "=== $b ==="
git log --oneline main..$b
done
Each branch should show 1 (or a small number of) commits ahead of main,
all from the subagent's work.git status
in the main repo shows you're on a feature branch, not main. Recovery:
git checkout main # restores main in the main repo; the feature branch is still saved
The subagent's commits remain on the feature branch — nothing is lost.worktree-agent-abc12345.
Rename to a meaningful slug:
git branch -m worktree-agent-abc12345 fix/<descriptive-slug>
Note: if tag.gpgSign is set globally, branch ops are unaffected, but
tag creation in other workflows may need git -c tag.gpgSign=false.If a subagent reported a blocker, decide: re-dispatch with a tighter prompt, do that fix yourself, or defer it. Don't push half-done work.
Goal: Publish each branch and open one PR per fix, with a body the user can act on.
Push all branches in parallel (separate Bash tool calls in one message):
git push -u origin <branch1>
Create one PR per branch, also in parallel. Use the project's PR
template if there is one (check .github/PULL_REQUEST_TEMPLATE.md).
Otherwise default to:
## Summary
- <what changed, in 2-4 bullets>
- <why — the exploit this closes, with severity tag for security fixes>
## Test plan
- [x] <project test command> — <count> passing
- [x] <typecheck command>
- [x] <lint command>
- [x] <new test cases added, briefly>
- [ ] <anything that requires manual verification, e.g., production env vars>
Use a HEREDOC for the body to preserve formatting:
gh pr create --base main --head <branch> --title "<title under 70 chars>" --body "$(cat <<'EOF'
## Summary
...
EOF
)"
For security PRs, lead the title with the conventional commit type
(fix(...)/refactor(...)) and put severity in the body, not the title.
Keep titles factual; security details belong in the body where they can
be redacted from the public timeline if needed.
Recommend a merge order to the user, based on:
Summarize the result:
worktree-agent-abc12345 makes the PR list ugly and breaks conventional
branch naming. Rename in Phase 6.Match the user's input language for narrative reports and PR descriptions unless they specify otherwise. Code identifiers, commit messages, and PR titles stay in English regardless. SKILL output (this file's contents) is always English; conversational responses follow the user.
development
Iteratively improve agent-facing text instructions (skills, slash commands, task prompts, CLAUDE.md sections, code-generation prompts) by having a bias-free executor run them and evaluating from both sides (executor self-report + caller-side metrics). Repeat until improvement plateaus. Use immediately after creating or significantly revising a skill/prompt, or when unexpected agent behavior is suspected to stem from ambiguous instructions.
development
Create viral X/Twitter posts based on the official X algorithm (Phoenix/Grok). Generates and improves posts optimized for maximum engagement. Use when (1) creating X posts for products, services, or information, (2) improving existing post drafts, (3) designing thread structures for long content.
development
UI design quality standards and principles for frontend implementation and code review. Use when (1) implementing UI from design specs or mockups, (2) reviewing frontend/UI code, (3) creating new UI components, (4) building user interfaces for web or mobile apps. Complements frontend-design skills with quality enforcement.
testing
End-to-end workflow for finding content, app, service, or product opportunities where search demand is high but quality supply is low (trend arbitrage). Use when: finding content gaps, keyword opportunities, niche research, analyzing whether a topic or idea is worth pursuing, or competitive gap analysis.