skills/llm-security/SKILL.md
LLM and AI application security testing skill for prompt injection (direct, indirect, multimodal), system-prompt extraction, RAG poisoning, memory poisoning, MCP server injection, skill-file injection, agentic tool misuse, computer-use UI injection, and excessive agency. Authorization required — this skill tests AI systems you are explicitly permitted to assess. Triggers on requests to test LLM / AI-agent / RAG / MCP / computer-use security, perform prompt injection, extract system prompts, poison RAG or memory, audit agent tool use, or evaluate AI guardrails.
npx skillsauth add hardw00t/ai-security-arsenal llm-securityInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Thin router skill for security testing of LLM applications and AI agents. Covers the OWASP LLM Top 10 (2025) with a 2026-grade threat model for frontier-model agentic systems: indirect injection, multimodal injection, MCP supply chain, memory poisoning, skill-file injection, computer-use UI injection, and agentic tool misuse.
Defensive / educational framing. Every workflow here assumes written authorization to test the target. Canary strings, throwaway accounts, and controlled endpoints are preferred over real-data exploitation at every step.
"test this LLM for prompt injection", "jailbreak this model" (authorized), "test AI guardrails", "assess RAG security", "poison this RAG corpus", "test MCP server injection", "red-team this agent", "extract system prompt", "test agent tool misuse", "test computer use UI injection", "audit LLM application security", "test multimodal injection", "test memory poisoning", "audit CLAUDE.md for injection".
api-security.sast-orchestration.cloud-security / iac-security.web-security.Many engagements need multiple skills; call them in parallel when scopes don't overlap.
Is the target an agent with tools? ─ yes ─▶ excessive_agency_testing.md
│ └─▶ agentic_tool_misuse.md
no
▼
Does it ingest external content (RAG/web/email)? ─ yes ─▶ indirect_injection_testing.md
│ └─▶ rag_poisoning.md (if RAG)
no
▼
Multimodal input accepted? ─ yes ─▶ payloads/multimodal_injection.md
│ └─▶ computer_use_abuse.md (if screen-controller)
no
▼
MCP servers attached? ─ yes ─▶ mcp_server_injection.md
no
▼
Persistent memory / cross-session state? ─ yes ─▶ memory_poisoning.md
no
▼
Project loads CLAUDE.md / skills / rules? ─ yes ─▶ skill_file_injection.md
no
▼
Always run last: direct_injection_testing.md + system_prompt_extraction.md
Parallelizable (fire concurrently, per rate limits):
agentic_tool_misuse.mdSequential (must observe one at a time):
Two clean partitions — pick whichever matches the engagement:
By OWASP category (one sub-agent each) for comprehensive coverage:
By attack surface for deep-dive on one class:
Parent agent aggregates findings (schemas/finding.json), de-dupes, and
cross-references overlapping findings (e.g. an MCP injection that enables
tool misuse).
Use extended thinking for:
Minimal thinking for:
payloads/multimodal_injection.md.workflows/computer_use_abuse.md.evidence.screenshot) for every
multimodal finding — visual proof is essential.Frontier models are also useful as testing tools: use a separate vision-capable model to generate candidate adversarial images and to judge whether OCR extraction succeeded.
All findings use schemas/finding.json. Required fields:
id, title, severity, attack_class, evidence, reproduction,
remediation. Skill-specific fields include attack_class,
target_model, target_agent, payload (with modality and delivery
vector), success_indicator, owasp_llm_id, defense_bypassed.
| Workflow | When |
|---|---|
| workflows/direct_injection_testing.md | Text prompts directly in user channel |
| workflows/indirect_injection_testing.md | Content arrives via retrieval / tools / email |
| workflows/system_prompt_extraction.md | Recover system prompt / tool schemas |
| workflows/rag_poisoning.md | RAG corpus + retrieval-layer attacks |
| workflows/agentic_tool_misuse.md | Coerce agent to misuse file/http/shell tools |
| workflows/memory_poisoning.md | Persistent cross-session memory attacks |
| workflows/mcp_server_injection.md | Malicious MCP server → host agent |
| workflows/skill_file_injection.md | CLAUDE.md / .cursor/rules / SKILL.md as vector |
| workflows/computer_use_abuse.md | Screenshot/UI-based injection for computer-use agents |
| workflows/excessive_agency_testing.md | Blast-radius assessment (OWASP LLM06) |
| File | Contents |
|---|---|
| payloads/injection_2026.txt | Modern direct/indirect injection patterns (trust-boundary, authority spoof, tool-result spoof, CoT injection) |
| payloads/system_prompt_extraction.txt | Full-dump + partial-leak + tool-schema extraction |
| payloads/encoding_obfuscation.txt | Base64, ROT, hex, unicode homoglyph, zero-width, emoji smuggle, tag-char |
| payloads/multimodal_injection.md | Image / audio / video / screenshot payload descriptions |
| payloads/legacy_jailbreaks.txt | DAN / STAN / DUDE / roleplay — regression only |
| File | Contents |
|---|---|
| references/owasp_llm_top10_2025.md | OWASP LLM Top 10 table + 2026 coverage checklist |
| references/defense_patterns_2026.md | Constitutional, classifiers, spotlighting, HITL, allowlisting — with known bypass hints |
| references/threat_model_agents.md | Actors, assets, surfaces, T1-T10 scenarios for agentic systems |
| references/bounty_patterns_2024_2026.md | Post-2023 public bug-bounty TTPs (RAG poisoning, CVE-2025-53773 tool-chain RCE, multimodal injection, adaptive defense evasion) |
| File | Contents |
|---|---|
| examples/indirect_injection_doc.md | Ready-to-deploy injection doc for RAG / shared drive |
| examples/malicious_mcp_response.json | Malicious MCP tool-response body |
| examples/poisoned_rag_chunk.md | Retrieval-optimized poisoning chunk |
| Tool | Purpose | Install |
|---|---|---|
| promptfoo | Automated prompt-injection sweeps and eval | npm i -g promptfoo |
| garak | LLM vulnerability scanner (NVIDIA) | pip install garak |
| giskard | LLM testing & evaluation | pip install giskard |
| pyrit | Microsoft's AI red-team toolkit | pip install pyrit |
| @modelcontextprotocol/sdk | Build controlled test MCP servers | npm i @modelcontextprotocol/sdk |
| custom HTTP server | Attacker-endpoint for exfil signal | any language |
| anthropic, openai, google-genai SDKs | Drive target APIs | per-SDK |
Use your own logging endpoint for exfil-signal tests so you can unambiguously confirm tool invocation.
Every engagement MUST have:
Populate authorization.scope_document and authorization.contact on
every finding record.
2026-04. Minimum tool versions tested:
development
Software Composition Analysis: find vulnerable dependencies, correlate CVE/GHSA/OSV across ecosystems, generate CycloneDX/SPDX SBOMs, assess license compliance, and run reachability-aware triage to suppress unexploitable findings. Use when scanning package dependencies (npm, PyPI, Maven, Cargo, Go, RubyGems, Composer), reviewing PR lockfile diffs, generating SBOMs, auditing licenses, hunting malicious packages, or auditing the software supply chain. Triggers on requests to scan dependencies, check vulnerable packages, generate SBOM, license compliance, typosquat/dependency-confusion review, or reachability-based vuln triage.
development
Static Application Security Testing orchestration — run and compose Semgrep, CodeQL, Bandit, gosec, Brakeman, SpotBugs, ESLint; author custom rules; ingest SARIF; triage and rank findings by exploitability. Use this skill when asked to scan code for vulnerabilities, write Semgrep/CodeQL rules, triage SAST output, reduce false positives, or integrate SAST into CI/CD. Triggers on phrases like 'scan this code', 'write a Semgrep rule', 'triage these findings', 'SARIF', 'SAST in CI', or when a repo is handed over for a security review.
testing
Internal network and Active Directory penetration testing skill for corporate environments. Use when performing authorized internal network assessments, AD attack path analysis, lateral movement, privilege escalation, and post-exploitation across Windows/Linux estates. Covers BloodHound, Impacket, NetExec/CrackMapExec, Responder, Rubeus, mimikatz, certipy. Triggers on requests to pentest internal networks, attack AD, perform lateral movement, Kerberoast, DCSync, or escalate privileges.
development
iOS mobile application penetration testing with Frida and Objection on jailbroken or non-jailbroken devices. Use for static + dynamic analysis of IPAs, SSL pinning / jailbreak / biometric bypass, keychain & local-storage extraction, network interception, and OWASP MASTG iOS assessments. Triggers on requests to pentest iOS apps, analyze IPAs, bypass iOS security controls, or produce MASTG-aligned findings.