AI Pentesting

Overview

Use AI agents to autonomously conduct penetration tests on web applications. Combine LLM reasoning with security tools (nmap, subfinder, nuclei, sqlmap, browser automation) to find and prove vulnerabilities with minimal human intervention.

Instructions

Methodology

AI pentesting follows the same phases as human pentesting, but the AI orchestrates each phase autonomously:

Phase 1: RECONNAISSANCE
├── Subdomain enumeration (subfinder)
├── Technology fingerprinting (whatweb, wappalyzer)
├── Port scanning (nmap)
├── API schema discovery (crawling, OpenAPI/GraphQL introspection)
└── Source code analysis (if white-box)
    AI decides: which tools to run, in what order, based on findings

Phase 2: VULNERABILITY ANALYSIS
├── Known CVE scanning (nuclei)
├── Web vulnerability scanning (OWASP ZAP, nikto)
├── API fuzzing (schemathesis)
├── Code-level vulnerability hunting (semgrep, CodeQL)
└── Data flow analysis (input → dangerous function)
    AI decides: which findings are likely exploitable

Phase 3: EXPLOITATION
├── SQL injection (sqlmap, manual payloads)
├── XSS (reflected, stored, DOM)
├── SSRF (internal access, cloud metadata)
├── Authentication bypass (broken auth, privilege escalation)
├── Business logic flaws (price manipulation, race conditions)
└── Browser-based exploitation (Playwright/Puppeteer)
    AI decides: exploitation order, payload selection, chaining

Phase 4: REPORTING
├── Proof-of-concept for each finding
├── Reproducible steps (curl commands, screenshots)
├── Severity rating (CVSS score)
├── Remediation guidance
└── Executive summary
    AI generates: structured, evidence-based report

Setting Up Shannon

Shannon is an open-source AI pentester that automates the full lifecycle:

# Clone and set up Shannon
git clone https://github.com/KeygraphHQ/shannon.git
cd shannon

# Configure credentials
export ANTHROPIC_API_KEY="your-api-key"
export CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000

# Run a pentest against your application
# Requires: Docker, target URL, source code repo
./shannon start URL=https://your-app.com REPO=./your-repo

# Monitor progress
./shannon logs

# View results in Temporal UI
open http://localhost:8233

Shannon's architecture:

Reconnaissance agent: Maps attack surface using nmap, subfinder, whatweb
Vulnerability agents: Specialized per OWASP category (injection, XSS, SSRF, auth bypass)
Exploitation agent: Uses browser automation to prove vulnerabilities with real exploits
Reporting agent: Generates findings with copy-paste PoC commands

Building a Custom AI Pentest Pipeline

For cases where Shannon doesn't fit, build a custom pipeline:

# ai_pentester.py
# Custom AI pentesting pipeline using LLM + security tools

import subprocess
import json
from openai import OpenAI

client = OpenAI()

class AIPentester:
    """Autonomous AI penetration tester.
    
    Orchestrates security tools using LLM reasoning
    to find and prove vulnerabilities.
    """
    
    def __init__(self, target_url: str, scope: list[str] = None):
        self.target = target_url
        self.scope = scope or [target_url]
        self.findings = []
        self.recon_data = {}
    
    async def run_pentest(self) -> dict:
        """Execute full penetration test lifecycle.
        
        Returns:
            Dict with findings, evidence, and recommendations
        """
        # Phase 1: Recon
        self.recon_data = await self._recon()
        
        # Phase 2: AI-guided vulnerability analysis
        targets = await self._analyze_attack_surface(self.recon_data)
        
        # Phase 3: AI-guided exploitation
        for target in targets:
            finding = await self._exploit(target)
            if finding:
                self.findings.append(finding)
        
        # Phase 4: Generate report
        report = await self._generate_report()
        return report
    
    async def _recon(self) -> dict:
        """Run reconnaissance tools and aggregate results."""
        recon = {}
        
        # Subdomain enumeration
        result = subprocess.run(
            ['subfinder', '-d', self._get_domain(), '-silent'],
            capture_output=True, text=True, timeout=120
        )
        recon['subdomains'] = result.stdout.strip().split('\n')
        
        # Technology fingerprinting
        result = subprocess.run(
            ['whatweb', self.target, '--log-json=/dev/stdout', '-a', '3'],
            capture_output=True, text=True, timeout=60
        )
        recon['technologies'] = json.loads(result.stdout) if result.stdout else {}
        
        # Port scanning
        result = subprocess.run(
            ['nmap', '-sV', '--top-ports', '1000', '-oJ', '-', self._get_domain()],
            capture_output=True, text=True, timeout=300
        )
        recon['ports'] = result.stdout
        
        # Nuclei scan for known CVEs
        result = subprocess.run(
            ['nuclei', '-u', self.target, '-severity', 'critical,high',
             '-json', '-silent'],
            capture_output=True, text=True, timeout=300
        )
        recon['known_vulns'] = [
            json.loads(line) for line in result.stdout.strip().split('\n')
            if line.strip()
        ]
        
        return recon
    
    async def _analyze_attack_surface(self, recon: dict) -> list:
        """Use AI to analyze recon data and prioritize attack targets."""
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content":
                 "You are an expert penetration tester. Analyze the "
                 "reconnaissance data and identify the most promising "
                 "attack vectors. Return JSON array of targets."},
                {"role": "user", "content":
                 f"Recon data:\n{json.dumps(recon, indent=2)}\n\n"
                 "Identify attack targets with: endpoint, vulnerability_type, "
                 "technique, priority (1-5), reasoning."}
            ],
            response_format={"type": "json_object"}
        )
        return json.loads(response.choices[0].message.content).get("targets", [])

    async def _exploit(self, target: dict) -> dict | None:
        """Attempt to exploit an identified vulnerability."""
        vuln_type = target.get('vulnerability_type', '').lower()
        handlers = {
            'injection': self._test_injection,
            'xss': self._test_xss,
            'ssrf': self._test_ssrf,
            'auth': self._test_auth_bypass,
        }
        for key, handler in handlers.items():
            if key in vuln_type:
                return await handler(target)
        return None

    async def _generate_report(self) -> dict:
        """Generate a structured penetration test report."""
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content":
                 "Generate a professional penetration test report with "
                 "executive summary, findings with CVSS scores, PoC steps, "
                 "and remediation recommendations."},
                {"role": "user", "content":
                 f"Target: {self.target}\n"
                 f"Findings: {json.dumps(self.findings, indent=2)}\n"
                 f"Recon data: {json.dumps(self.recon_data, indent=2)}"}
            ]
        )
        return {
            "target": self.target,
            "findings_count": len(self.findings),
            "findings": self.findings,
            "report": response.choices[0].message.content
        }

CI/CD Integration

Run AI pentests on every deployment:

# .github/workflows/pentest.yml
name: AI Penetration Test
on:
  push:
    branches: [main]
  schedule:
    - cron: '0 2 * * 1'  # Weekly Monday 2 AM

jobs:
  pentest:
    runs-on: ubuntu-latest
    services:
      app:
        image: your-app:${{ github.sha }}
        ports:
          - 8080:8080
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Run Shannon Pentest
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          git clone https://github.com/KeygraphHQ/shannon.git
          cd shannon
          ./shannon start \
            URL=http://localhost:8080 \
            REPO=../ \
            MAX_CONCURRENT=3
          
          # Wait for completion and extract report
          ./shannon wait
          cp workspace/report.md $GITHUB_WORKSPACE/pentest-report.md
      
      - name: Upload Report
        uses: actions/upload-artifact@v4
        with:
          name: pentest-report
          path: pentest-report.md
      
      - name: Fail on Critical Findings
        run: |
          if grep -q "CRITICAL" pentest-report.md; then
            echo "::error::Critical vulnerabilities found!"
            exit 1
          fi

Report Structure

A professional AI-generated pentest report should include: executive summary (scope, duration, methodology, overall risk, findings count by severity), individual findings (each with CVSS score, affected endpoint/parameter, evidence with reproducible curl commands, impact description, and specific remediation guidance), and a remediation priority list ordered by severity with recommended fix timelines.

Examples

Run an autonomous pentest on a web application

Set up Shannon to run a full penetration test on our staging environment at https://staging.ourapp.com. The source code is in the current repository. Configure it to test for: SQL injection, XSS, SSRF, and broken authentication. Run with maximum concurrency and generate a report with reproducible proof-of-concept exploits for every finding. Flag any critical vulnerabilities that need immediate attention.

Build a custom AI pentest pipeline

Build a custom AI pentesting pipeline that combines subfinder (subdomain discovery), whatweb (tech fingerprinting), nuclei (CVE scanning), and schemathesis (API fuzzing) orchestrated by an LLM agent. The LLM should analyze results from each tool, decide what to test next, and generate exploitation payloads. Target: our API at api.example.com with the OpenAPI spec at /docs/openapi.json. Produce a structured findings report.

Integrate AI pentesting into CI/CD

Add automated penetration testing to our GitHub Actions pipeline. It should run on every push to main and weekly on a schedule. The app runs in Docker (docker-compose up), exposed at localhost:8080. Use Shannon for the pentest, upload the report as an artifact, and fail the build if any critical or high severity vulnerabilities are found. Include Slack notification for findings.

Guidelines

Only run penetration tests against systems you have explicit written authorization to test — unauthorized testing is illegal
AI pentesters can cause real damage (data modification, service disruption) — always test against staging environments, never production
Review AI-generated exploitation attempts before running them — LLMs can hallucinate or generate overly aggressive payloads
Treat pentest reports as confidential — they contain vulnerability details and proof-of-concept exploits
Set time limits and scope boundaries for autonomous testing to prevent runaway scans
Validate AI findings manually — false positives in automated reports erode trust with stakeholders
Store API keys and credentials used for pentesting securely — never hardcode them in CI configurations

AI Pentesting

Overview

Instructions

Methodology

AI pentesting follows the same phases as human pentesting, but the AI orchestrates each phase autonomously:

Phase 1: RECONNAISSANCE
├── Subdomain enumeration (subfinder)
├── Technology fingerprinting (whatweb, wappalyzer)
├── Port scanning (nmap)
├── API schema discovery (crawling, OpenAPI/GraphQL introspection)
└── Source code analysis (if white-box)
    AI decides: which tools to run, in what order, based on findings

Phase 2: VULNERABILITY ANALYSIS
├── Known CVE scanning (nuclei)
├── Web vulnerability scanning (OWASP ZAP, nikto)
├── API fuzzing (schemathesis)
├── Code-level vulnerability hunting (semgrep, CodeQL)
└── Data flow analysis (input → dangerous function)
    AI decides: which findings are likely exploitable

Phase 3: EXPLOITATION
├── SQL injection (sqlmap, manual payloads)
├── XSS (reflected, stored, DOM)
├── SSRF (internal access, cloud metadata)
├── Authentication bypass (broken auth, privilege escalation)
├── Business logic flaws (price manipulation, race conditions)
└── Browser-based exploitation (Playwright/Puppeteer)
    AI decides: exploitation order, payload selection, chaining

Phase 4: REPORTING
├── Proof-of-concept for each finding
├── Reproducible steps (curl commands, screenshots)
├── Severity rating (CVSS score)
├── Remediation guidance
└── Executive summary
    AI generates: structured, evidence-based report

Setting Up Shannon

Shannon is an open-source AI pentester that automates the full lifecycle:

# Clone and set up Shannon
git clone https://github.com/KeygraphHQ/shannon.git
cd shannon

# Configure credentials
export ANTHROPIC_API_KEY="your-api-key"
export CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000

# Run a pentest against your application
# Requires: Docker, target URL, source code repo
./shannon start URL=https://your-app.com REPO=./your-repo

# Monitor progress
./shannon logs

# View results in Temporal UI
open http://localhost:8233

Shannon's architecture:

Reconnaissance agent: Maps attack surface using nmap, subfinder, whatweb
Vulnerability agents: Specialized per OWASP category (injection, XSS, SSRF, auth bypass)
Exploitation agent: Uses browser automation to prove vulnerabilities with real exploits
Reporting agent: Generates findings with copy-paste PoC commands

Building a Custom AI Pentest Pipeline

For cases where Shannon doesn't fit, build a custom pipeline:

# ai_pentester.py
# Custom AI pentesting pipeline using LLM + security tools

import subprocess
import json
from openai import OpenAI

client = OpenAI()

class AIPentester:
    """Autonomous AI penetration tester.
    
    Orchestrates security tools using LLM reasoning
    to find and prove vulnerabilities.
    """
    
    def __init__(self, target_url: str, scope: list[str] = None):
        self.target = target_url
        self.scope = scope or [target_url]
        self.findings = []
        self.recon_data = {}
    
    async def run_pentest(self) -> dict:
        """Execute full penetration test lifecycle.
        
        Returns:
            Dict with findings, evidence, and recommendations
        """
        # Phase 1: Recon
        self.recon_data = await self._recon()
        
        # Phase 2: AI-guided vulnerability analysis
        targets = await self._analyze_attack_surface(self.recon_data)
        
        # Phase 3: AI-guided exploitation
        for target in targets:
            finding = await self._exploit(target)
            if finding:
                self.findings.append(finding)
        
        # Phase 4: Generate report
        report = await self._generate_report()
        return report
    
    async def _recon(self) -> dict:
        """Run reconnaissance tools and aggregate results."""
        recon = {}
        
        # Subdomain enumeration
        result = subprocess.run(
            ['subfinder', '-d', self._get_domain(), '-silent'],
            capture_output=True, text=True, timeout=120
        )
        recon['subdomains'] = result.stdout.strip().split('\n')
        
        # Technology fingerprinting
        result = subprocess.run(
            ['whatweb', self.target, '--log-json=/dev/stdout', '-a', '3'],
            capture_output=True, text=True, timeout=60
        )
        recon['technologies'] = json.loads(result.stdout) if result.stdout else {}
        
        # Port scanning
        result = subprocess.run(
            ['nmap', '-sV', '--top-ports', '1000', '-oJ', '-', self._get_domain()],
            capture_output=True, text=True, timeout=300
        )
        recon['ports'] = result.stdout
        
        # Nuclei scan for known CVEs
        result = subprocess.run(
            ['nuclei', '-u', self.target, '-severity', 'critical,high',
             '-json', '-silent'],
            capture_output=True, text=True, timeout=300
        )
        recon['known_vulns'] = [
            json.loads(line) for line in result.stdout.strip().split('\n')
            if line.strip()
        ]
        
        return recon
    
    async def _analyze_attack_surface(self, recon: dict) -> list:
        """Use AI to analyze recon data and prioritize attack targets."""
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content":
                 "You are an expert penetration tester. Analyze the "
                 "reconnaissance data and identify the most promising "
                 "attack vectors. Return JSON array of targets."},
                {"role": "user", "content":
                 f"Recon data:\n{json.dumps(recon, indent=2)}\n\n"
                 "Identify attack targets with: endpoint, vulnerability_type, "
                 "technique, priority (1-5), reasoning."}
            ],
            response_format={"type": "json_object"}
        )
        return json.loads(response.choices[0].message.content).get("targets", [])

    async def _exploit(self, target: dict) -> dict | None:
        """Attempt to exploit an identified vulnerability."""
        vuln_type = target.get('vulnerability_type', '').lower()
        handlers = {
            'injection': self._test_injection,
            'xss': self._test_xss,
            'ssrf': self._test_ssrf,
            'auth': self._test_auth_bypass,
        }
        for key, handler in handlers.items():
            if key in vuln_type:
                return await handler(target)
        return None

    async def _generate_report(self) -> dict:
        """Generate a structured penetration test report."""
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content":
                 "Generate a professional penetration test report with "
                 "executive summary, findings with CVSS scores, PoC steps, "
                 "and remediation recommendations."},
                {"role": "user", "content":
                 f"Target: {self.target}\n"
                 f"Findings: {json.dumps(self.findings, indent=2)}\n"
                 f"Recon data: {json.dumps(self.recon_data, indent=2)}"}
            ]
        )
        return {
            "target": self.target,
            "findings_count": len(self.findings),
            "findings": self.findings,
            "report": response.choices[0].message.content
        }

CI/CD Integration

Run AI pentests on every deployment:

# .github/workflows/pentest.yml
name: AI Penetration Test
on:
  push:
    branches: [main]
  schedule:
    - cron: '0 2 * * 1'  # Weekly Monday 2 AM

jobs:
  pentest:
    runs-on: ubuntu-latest
    services:
      app:
        image: your-app:${{ github.sha }}
        ports:
          - 8080:8080
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Run Shannon Pentest
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          git clone https://github.com/KeygraphHQ/shannon.git
          cd shannon
          ./shannon start \
            URL=http://localhost:8080 \
            REPO=../ \
            MAX_CONCURRENT=3
          
          # Wait for completion and extract report
          ./shannon wait
          cp workspace/report.md $GITHUB_WORKSPACE/pentest-report.md
      
      - name: Upload Report
        uses: actions/upload-artifact@v4
        with:
          name: pentest-report
          path: pentest-report.md
      
      - name: Fail on Critical Findings
        run: |
          if grep -q "CRITICAL" pentest-report.md; then
            echo "::error::Critical vulnerabilities found!"
            exit 1
          fi

Report Structure

Examples

Run an autonomous pentest on a web application

Set up Shannon to run a full penetration test on our staging environment at https://staging.ourapp.com. The source code is in the current repository. Configure it to test for: SQL injection, XSS, SSRF, and broken authentication. Run with maximum concurrency and generate a report with reproducible proof-of-concept exploits for every finding. Flag any critical vulnerabilities that need immediate attention.

Build a custom AI pentest pipeline

Build a custom AI pentesting pipeline that combines subfinder (subdomain discovery), whatweb (tech fingerprinting), nuclei (CVE scanning), and schemathesis (API fuzzing) orchestrated by an LLM agent. The LLM should analyze results from each tool, decide what to test next, and generate exploitation payloads. Target: our API at api.example.com with the OpenAPI spec at /docs/openapi.json. Produce a structured findings report.

Integrate AI pentesting into CI/CD

Add automated penetration testing to our GitHub Actions pipeline. It should run on every push to main and weekly on a schedule. The app runs in Docker (docker-compose up), exposed at localhost:8080. Use Shannon for the pentest, upload the report as an artifact, and fail the build if any critical or high severity vulnerabilities are found. Include Slack notification for findings.

Guidelines

Only run penetration tests against systems you have explicit written authorization to test — unauthorized testing is illegal
AI pentesters can cause real damage (data modification, service disruption) — always test against staging environments, never production
Review AI-generated exploitation attempts before running them — LLMs can hallucinate or generate overly aggressive payloads
Treat pentest reports as confidential — they contain vulnerability details and proof-of-concept exploits
Set time limits and scope boundaries for autonomous testing to prevent runaway scans
Validate AI findings manually — false positives in automated reports erode trust with stakeholders
Store API keys and credentials used for pentesting securely — never hardcode them in CI configurations

Adoption

eliferjunior/ai-pentesting

$ install --global

Security Scan Results

SKILL.md

AI Pentesting

Overview

Instructions

Methodology

Setting Up Shannon

Building a Custom AI Pentest Pipeline

CI/CD Integration

Report Structure

Examples

Run an autonomous pentest on a web application

Build a custom AI pentest pipeline

Integrate AI pentesting into CI/CD

Guidelines

Related Skills

eliferjunior/fireworks-ai

eliferjunior/firecrawl

eliferjunior/firebase

eliferjunior/file-upload-processor

eliferjunior/ai-pentesting

$ install --global

Security Scan Results

SKILL.md

AI Pentesting

Overview

Instructions

Methodology

Setting Up Shannon

Building a Custom AI Pentest Pipeline

CI/CD Integration

Report Structure

Examples

Run an autonomous pentest on a web application

Build a custom AI pentest pipeline

Integrate AI pentesting into CI/CD

Guidelines

Related Skills

eliferjunior/fireworks-ai

eliferjunior/firecrawl

eliferjunior/firebase

eliferjunior/file-upload-processor