Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

wgpsec/judge-pentest

Name: judge-pentest
Author: wgpsec

skills/general/judge-pentest/SKILL.md

npx skillsauth add wgpsec/AboutSecurity judge-pentest

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Penetration Testing Evaluation Checklist

Web Application Vulnerability Coverage Check

Check each attack surface below for whether it has been tested; mark untested ones as gaps:

Injection

[ ] SQL Injection (login forms, search, API params, cookies)
[ ] XPath Injection
[ ] LDAP Injection
[ ] Command Injection (OS Command Injection)
[ ] SSTI (Server-Side Template Injection)
[ ] XXE (XML External Entity Injection)

Cross-Site

[ ] Reflected XSS (search box, URL params, error pages)
[ ] Stored XSS (comments, feedback, user profiles)
[ ] DOM XSS
[ ] CSRF (transfers, password changes, critical operations)

Authentication & Authorization

[ ] Default/weak credentials
[ ] SQL injection auth bypass
[ ] Brute force protection (account lockout mechanism)
[ ] Username enumeration (error message differences)
[ ] Session management (Session Fixation, Cookie security attributes)
[ ] JWT/Token security (signature verification, algorithm confusion, plaintext encoding)
[ ] Vertical privilege escalation (regular user → admin functions)
[ ] Horizontal privilege escalation / IDOR (accessing other users' resources)

Business Logic

[ ] IDOR — account info viewing
[ ] IDOR — transfer/transaction operations
[ ] IDOR — password change
[ ] Negative/zero amount transactions
[ ] Concurrency/race conditions
[ ] Business flow bypass

Information Disclosure

[ ] Error page info leaks (stack traces, paths)
[ ] API documentation exposure (Swagger, WSDL)
[ ] Backup file disclosure
[ ] Sensitive config exposure
[ ] HTTP response headers (Server version, X-Powered-By)

Server-Side

[ ] SSRF (Server-Side Request Forgery)
[ ] File upload vulnerabilities
[ ] Path traversal / LFI / RFI
[ ] Deserialization vulnerabilities

Configuration

[ ] Directory listing

API-Specific

[ ] REST API auth bypass
[ ] API IDOR
[ ] API parameter tampering
[ ] API rate limiting

Known CVE/CNVD

[ ] Known CVEs for target product/tech stack
[ ] Known CNVDs for target product/tech stack

Evaluation Decision Logic

The goal of penetration testing is to discover as many vulnerabilities as possible; do NOT end prematurely.

# In pentest scenarios, complete is advisory only and does NOT trigger early exit
# The judge's core value is providing precise "what to test next round" feedback

if tested_categories >= 90% of total && two consecutive rounds with no new vulns:
    complete = true, confidence >= 0.8
else:
    complete = false
    feedback = explicitly list untested attack surfaces with specific testing suggestions
    missing_areas = names of untested categories

Important: Better to run one extra round than to miss one direction. Even if many vulnerabilities have been found, if there are still untested attack surfaces, return complete=false.

If a target product/tech stack was identified in this round but search_vulndb was never called, MUST return complete=false and require using search_vulndb(query="product name") to query known vulnerabilities.

Feedback Template

When complete=false, feedback should include:

Completed work (acknowledge positively, avoid repetition)
Specific missing directions (do NOT say "keep testing" vaguely — specify concrete endpoints + vulnerability types)
Suggested test steps (e.g., "use sqlmap for deep injection testing on the query parameter of /api/search")

wgpsec/judge-pentest

skills/general/judge-pentest/SKILL.md

Penetration testing evaluation checklist for the decision Agent. Evaluates whether a pentest has sufficiently covered all attack surfaces, determines task completion, and provides specific feedback on uncovered areas.

1,193 stars

testing

Updated Apr 25, 2026

$ install --global

skillsauth

npx skillsauth add wgpsec/AboutSecurity judge-pentest

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 25, 2026, 2:44 AM10.6s1 file scanned

SKILL.md

name:: judge-pentest
description:: Penetration testing evaluation checklist for the decision Agent. Evaluates whether a pentest has sufficiently covered all attack surfaces, determines task completion, and provides specific feedback on uncovered areas.
tags:: judge,evaluation,pentest,coverage,decision
category:: general

Penetration Testing Evaluation Checklist

Web Application Vulnerability Coverage Check

Check each attack surface below for whether it has been tested; mark untested ones as gaps:

Injection

[ ] SQL Injection (login forms, search, API params, cookies)
[ ] XPath Injection
[ ] LDAP Injection
[ ] Command Injection (OS Command Injection)
[ ] SSTI (Server-Side Template Injection)
[ ] XXE (XML External Entity Injection)

Cross-Site

[ ] Reflected XSS (search box, URL params, error pages)
[ ] Stored XSS (comments, feedback, user profiles)
[ ] DOM XSS
[ ] CSRF (transfers, password changes, critical operations)

Authentication & Authorization

[ ] Default/weak credentials
[ ] SQL injection auth bypass
[ ] Brute force protection (account lockout mechanism)
[ ] Username enumeration (error message differences)
[ ] Session management (Session Fixation, Cookie security attributes)
[ ] JWT/Token security (signature verification, algorithm confusion, plaintext encoding)
[ ] Vertical privilege escalation (regular user → admin functions)
[ ] Horizontal privilege escalation / IDOR (accessing other users' resources)

Business Logic

[ ] IDOR — account info viewing
[ ] IDOR — transfer/transaction operations
[ ] IDOR — password change
[ ] Negative/zero amount transactions
[ ] Concurrency/race conditions
[ ] Business flow bypass

Information Disclosure

[ ] Error page info leaks (stack traces, paths)
[ ] API documentation exposure (Swagger, WSDL)
[ ] Backup file disclosure
[ ] Sensitive config exposure
[ ] HTTP response headers (Server version, X-Powered-By)

Server-Side

[ ] SSRF (Server-Side Request Forgery)
[ ] File upload vulnerabilities
[ ] Path traversal / LFI / RFI
[ ] Deserialization vulnerabilities

Configuration

[ ] Directory listing

API-Specific

[ ] REST API auth bypass
[ ] API IDOR
[ ] API parameter tampering
[ ] API rate limiting

Known CVE/CNVD

[ ] Known CVEs for target product/tech stack
[ ] Known CNVDs for target product/tech stack

Evaluation Decision Logic

The goal of penetration testing is to discover as many vulnerabilities as possible; do NOT end prematurely.

# In pentest scenarios, complete is advisory only and does NOT trigger early exit
# The judge's core value is providing precise "what to test next round" feedback

if tested_categories >= 90% of total && two consecutive rounds with no new vulns:
    complete = true, confidence >= 0.8
else:
    complete = false
    feedback = explicitly list untested attack surfaces with specific testing suggestions
    missing_areas = names of untested categories

Important: Better to run one extra round than to miss one direction. Even if many vulnerabilities have been found, if there are still untested attack surfaces, return complete=false.

Feedback Template

When complete=false, feedback should include:

Completed work (acknowledge positively, avoid repetition)
Specific missing directions (do NOT say "keep testing" vaguely — specify concrete endpoints + vulnerability types)
Suggested test steps (e.g., "use sqlmap for deep injection testing on the query parameter of /api/search")

Related Skills

wgpsec/azure-pentesting

testing

VerifiedTrustedCommunity

Azure 云环境渗透测试总体方法论。当目标使用 Azure/Microsoft 365/Entra ID、发现 Azure 相关资产（Blob Storage/App Service/Azure VM/Azure Functions）、获取 Azure 凭据（Service Principal/Managed Identity/Access Token）、或需要对 Azure 环境进行安全评估时使用。提供从未授权枚举到 Entra ID 攻击、服务提权、Cloud-to-OnPrem 横向移动的全流程决策树。覆盖 35+ Azure 服务攻击面

1,581SKILL.mdUpdated Apr 24, 2026

wgpsec/azure-pentesting

wgpsec/mythic-c2

tools

VerifiedTrustedCommunity

Mythic C2 操作方法论。当需要部署 Mythic、选择 Mythic Agent、安装 C2 Profile、配置 HTTP/DNS/WebSocket/SMB/TCP 通信、生成 payload、管理回连任务，或把 Mythic 作为跨平台 C2 框架用于授权红队演练时使用。覆盖 mythic-cli 安装、Agent/Profile 选择、SSL 证书配置、payload 构建和基础 OPSEC 判断

1,345SKILL.mdUpdated May 22, 2026

wgpsec/docker-pentesting

development

VerifiedTrustedCommunity

Docker 安全测试与容器渗透方法论。当需要评估 Docker 容器、Docker Daemon、Docker Registry、镜像层、构建产物或容器逃逸风险时使用。覆盖容器环境识别、特权容器逃逸、docker.sock/Remote API 利用、procfs/cgroup/capabilities 滥用、Docker 用户组提权、运行时/内核 CVE、Registry 枚举、镜像层 Secret 分析和构建上下文泄露。发现 Docker 容器环境、Registry 暴露、镜像凭据或容器配置错误时应使用此技能

1,345SKILL.mdUpdated May 22, 2026

wgpsec/docker-pentesting

wgpsec/padbuster-padding-oracle

development

VerifiedTrustedCommunity

使用 PadBuster 进行 Padding Oracle 攻击。当发现 Web 应用使用 CBC 模式加密且存在 Padding Oracle 漏洞时使用。PadBuster 可自动解密密文和伪造任意明文对应的合法密文，适用于加密 Cookie/Token/URL 参数。任何涉及 Padding Oracle 攻击、CBC 密文解密、Cookie 伪造的场景都应使用此技能

1,345SKILL.mdUpdated May 6, 2026

wgpsec/padbuster-padding-oracle

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/wgpsec/AboutSecurity.git

# Copy into Claude Code skills folder (global)
cp -r AboutSecurity/skills/general/judge-pentest ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

wgpsec/AboutSecurity

1,193 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT