.claude/skills/ts-ai-pentesting/SKILL.md
Run autonomous AI-driven penetration tests on web applications using tools like Shannon, PentAGI, and similar frameworks. Use when tasks involve setting up automated penetration testing pipelines, combining AI agents with security tools (nmap, subfinder, nuclei, sqlmap), building autonomous exploit chains, generating pentest reports with proof-of-concept exploits, or integrating AI pentesting into CI/CD pipelines. Covers the full pentest lifecycle from reconnaissance to reporting using AI orchestration.
npx skillsauth add eliferjunior/Claude ai-pentestingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use AI agents to autonomously conduct penetration tests on web applications. Combine LLM reasoning with security tools (nmap, subfinder, nuclei, sqlmap, browser automation) to find and prove vulnerabilities with minimal human intervention.
AI pentesting follows the same phases as human pentesting, but the AI orchestrates each phase autonomously:
Phase 1: RECONNAISSANCE
├── Subdomain enumeration (subfinder)
├── Technology fingerprinting (whatweb, wappalyzer)
├── Port scanning (nmap)
├── API schema discovery (crawling, OpenAPI/GraphQL introspection)
└── Source code analysis (if white-box)
AI decides: which tools to run, in what order, based on findings
Phase 2: VULNERABILITY ANALYSIS
├── Known CVE scanning (nuclei)
├── Web vulnerability scanning (OWASP ZAP, nikto)
├── API fuzzing (schemathesis)
├── Code-level vulnerability hunting (semgrep, CodeQL)
└── Data flow analysis (input → dangerous function)
AI decides: which findings are likely exploitable
Phase 3: EXPLOITATION
├── SQL injection (sqlmap, manual payloads)
├── XSS (reflected, stored, DOM)
├── SSRF (internal access, cloud metadata)
├── Authentication bypass (broken auth, privilege escalation)
├── Business logic flaws (price manipulation, race conditions)
└── Browser-based exploitation (Playwright/Puppeteer)
AI decides: exploitation order, payload selection, chaining
Phase 4: REPORTING
├── Proof-of-concept for each finding
├── Reproducible steps (curl commands, screenshots)
├── Severity rating (CVSS score)
├── Remediation guidance
└── Executive summary
AI generates: structured, evidence-based report
Shannon is an open-source AI pentester that automates the full lifecycle:
# Clone and set up Shannon
git clone https://github.com/KeygraphHQ/shannon.git
cd shannon
# Configure credentials
export ANTHROPIC_API_KEY="your-api-key"
export CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000
# Run a pentest against your application
# Requires: Docker, target URL, source code repo
./shannon start URL=https://your-app.com REPO=./your-repo
# Monitor progress
./shannon logs
# View results in Temporal UI
open http://localhost:8233
Shannon's architecture:
For cases where Shannon doesn't fit, build a custom pipeline:
# ai_pentester.py
# Custom AI pentesting pipeline using LLM + security tools
import subprocess
import json
from openai import OpenAI
client = OpenAI()
class AIPentester:
"""Autonomous AI penetration tester.
Orchestrates security tools using LLM reasoning
to find and prove vulnerabilities.
"""
def __init__(self, target_url: str, scope: list[str] = None):
self.target = target_url
self.scope = scope or [target_url]
self.findings = []
self.recon_data = {}
async def run_pentest(self) -> dict:
"""Execute full penetration test lifecycle.
Returns:
Dict with findings, evidence, and recommendations
"""
# Phase 1: Recon
self.recon_data = await self._recon()
# Phase 2: AI-guided vulnerability analysis
targets = await self._analyze_attack_surface(self.recon_data)
# Phase 3: AI-guided exploitation
for target in targets:
finding = await self._exploit(target)
if finding:
self.findings.append(finding)
# Phase 4: Generate report
report = await self._generate_report()
return report
async def _recon(self) -> dict:
"""Run reconnaissance tools and aggregate results."""
recon = {}
# Subdomain enumeration
result = subprocess.run(
['subfinder', '-d', self._get_domain(), '-silent'],
capture_output=True, text=True, timeout=120
)
recon['subdomains'] = result.stdout.strip().split('\n')
# Technology fingerprinting
result = subprocess.run(
['whatweb', self.target, '--log-json=/dev/stdout', '-a', '3'],
capture_output=True, text=True, timeout=60
)
recon['technologies'] = json.loads(result.stdout) if result.stdout else {}
# Port scanning
result = subprocess.run(
['nmap', '-sV', '--top-ports', '1000', '-oJ', '-', self._get_domain()],
capture_output=True, text=True, timeout=300
)
recon['ports'] = result.stdout
# Nuclei scan for known CVEs
result = subprocess.run(
['nuclei', '-u', self.target, '-severity', 'critical,high',
'-json', '-silent'],
capture_output=True, text=True, timeout=300
)
recon['known_vulns'] = [
json.loads(line) for line in result.stdout.strip().split('\n')
if line.strip()
]
return recon
async def _analyze_attack_surface(self, recon: dict) -> list:
"""Use AI to analyze recon data and prioritize attack targets."""
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content":
"You are an expert penetration tester. Analyze the "
"reconnaissance data and identify the most promising "
"attack vectors. Return JSON array of targets."},
{"role": "user", "content":
f"Recon data:\n{json.dumps(recon, indent=2)}\n\n"
"Identify attack targets with: endpoint, vulnerability_type, "
"technique, priority (1-5), reasoning."}
],
response_format={"type": "json_object"}
)
return json.loads(response.choices[0].message.content).get("targets", [])
async def _exploit(self, target: dict) -> dict | None:
"""Attempt to exploit an identified vulnerability."""
vuln_type = target.get('vulnerability_type', '').lower()
handlers = {
'injection': self._test_injection,
'xss': self._test_xss,
'ssrf': self._test_ssrf,
'auth': self._test_auth_bypass,
}
for key, handler in handlers.items():
if key in vuln_type:
return await handler(target)
return None
async def _generate_report(self) -> dict:
"""Generate a structured penetration test report."""
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content":
"Generate a professional penetration test report with "
"executive summary, findings with CVSS scores, PoC steps, "
"and remediation recommendations."},
{"role": "user", "content":
f"Target: {self.target}\n"
f"Findings: {json.dumps(self.findings, indent=2)}\n"
f"Recon data: {json.dumps(self.recon_data, indent=2)}"}
]
)
return {
"target": self.target,
"findings_count": len(self.findings),
"findings": self.findings,
"report": response.choices[0].message.content
}
Run AI pentests on every deployment:
# .github/workflows/pentest.yml
name: AI Penetration Test
on:
push:
branches: [main]
schedule:
- cron: '0 2 * * 1' # Weekly Monday 2 AM
jobs:
pentest:
runs-on: ubuntu-latest
services:
app:
image: your-app:${{ github.sha }}
ports:
- 8080:8080
steps:
- uses: actions/checkout@v4
- name: Run Shannon Pentest
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: |
git clone https://github.com/KeygraphHQ/shannon.git
cd shannon
./shannon start \
URL=http://localhost:8080 \
REPO=../ \
MAX_CONCURRENT=3
# Wait for completion and extract report
./shannon wait
cp workspace/report.md $GITHUB_WORKSPACE/pentest-report.md
- name: Upload Report
uses: actions/upload-artifact@v4
with:
name: pentest-report
path: pentest-report.md
- name: Fail on Critical Findings
run: |
if grep -q "CRITICAL" pentest-report.md; then
echo "::error::Critical vulnerabilities found!"
exit 1
fi
A professional AI-generated pentest report should include: executive summary (scope, duration, methodology, overall risk, findings count by severity), individual findings (each with CVSS score, affected endpoint/parameter, evidence with reproducible curl commands, impact description, and specific remediation guidance), and a remediation priority list ordered by severity with recommended fix timelines.
Set up Shannon to run a full penetration test on our staging environment at https://staging.ourapp.com. The source code is in the current repository. Configure it to test for: SQL injection, XSS, SSRF, and broken authentication. Run with maximum concurrency and generate a report with reproducible proof-of-concept exploits for every finding. Flag any critical vulnerabilities that need immediate attention.
Build a custom AI pentesting pipeline that combines subfinder (subdomain discovery), whatweb (tech fingerprinting), nuclei (CVE scanning), and schemathesis (API fuzzing) orchestrated by an LLM agent. The LLM should analyze results from each tool, decide what to test next, and generate exploitation payloads. Target: our API at api.example.com with the OpenAPI spec at /docs/openapi.json. Produce a structured findings report.
Add automated penetration testing to our GitHub Actions pipeline. It should run on every push to main and weekly on a schedule. The app runs in Docker (docker-compose up), exposed at localhost:8080. Use Shannon for the pentest, upload the report as an artifact, and fail the build if any critical or high severity vulnerabilities are found. Include Slack notification for findings.
development
Expert guidance for Fireworks AI, the platform for running open-source LLMs (Llama, Mixtral, Qwen, etc.) with enterprise-grade speed and reliability. Helps developers integrate Fireworks' inference API, fine-tune models, and deploy custom model endpoints with function calling and structured output support.
development
Convert any website into clean, structured data with Firecrawl — API-first web scraping service. Use when someone asks to "turn a website into markdown", "scrape website for LLM", "Firecrawl", "extract website content as clean text", "crawl and convert to structured data", or "scrape website for RAG". Covers single-page scraping, full-site crawling, structured extraction, and LLM-ready output.
tools
Expert guidance for Firebase, Google's platform for building and scaling web and mobile applications. Helps developers set up authentication, Firestore/Realtime Database, Cloud Functions, hosting, storage, and analytics using Firebase's SDK and CLI.
development
When the user needs to build file upload functionality for a web application. Use when the user mentions "file upload," "image upload," "upload endpoint," "multipart upload," "presigned URL," "S3 upload," "file validation," "upload to cloud storage," or "accept user files." Handles upload endpoints, file validation (type, size, magic bytes), cloud storage integration, and upload status tracking. For image/video processing after upload, see media-transcoder.