skills/AI/AI-Risk-Frameworks/SKILL.md
How to assess and document AI security risks using industry frameworks. Use this skill whenever the user mentions AI security, ML vulnerabilities, model risks, LLM security, adversarial attacks, data poisoning, prompt injection, or needs to evaluate AI system safety. Trigger for any request about AI threat modeling, security audits, risk documentation, or compliance with AI security standards.
npx skillsauth add abelrguezr/hacktricks-skills ai-risk-assessmentInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill helps you assess and document security risks in AI/ML systems using industry-standard frameworks: OWASP Top 10 ML, Google SAIF, MITRE ATLAS, and LLMJacking patterns.
Use this skill when:
| # | Vulnerability | What It Is | Example | |---|---------------|------------|--------| | 1 | Input Manipulation | Tiny changes to input data fool the model | Paint specks on stop sign → speed limit sign | | 2 | Data Poisoning | Training data polluted with bad samples | Malware labeled as benign in antivirus training | | 3 | Model Inversion | Reconstruct sensitive inputs from outputs | Rebuild patient MRI from cancer model predictions | | 4 | Membership Inference | Detect if specific record was in training | Confirm bank transaction in fraud model training data | | 5 | Model Theft | Clone model behavior via repeated queries | Harvest Q&A pairs to build equivalent local model | | 6 | AI Supply-Chain | Compromise ML pipeline components | Poisoned dependency installs backdoored model | | 7 | Transfer Learning Attack | Malicious logic survives fine-tuning | Vision backbone with hidden trigger persists after adaptation | | 8 | Model Skewing | Biased data shifts outputs to attacker's agenda | Spam emails labeled as ham to bypass filter | | 9 | Output Integrity | Alter predictions in transit | Flip "malicious" verdict to "benign" before quarantine | | 10 | Model Poisoning | Direct changes to model parameters | Tweak fraud detection weights to approve certain cards |
| Risk | Description | |------|-------------| | Data Poisoning | Malicious actors alter training/tuning data to degrade accuracy or implant backdoors | | Unauthorized Training Data | Ingesting copyrighted, sensitive, or unpermitted datasets creates legal/ethical liabilities | | Model Source Tampering | Supply-chain manipulation embeds hidden logic that persists after retraining | | Excessive Data Handling | Weak retention controls store more personal data than necessary | | Model Exfiltration | Attackers steal model files/weights, causing IP loss | | Model Deployment Tampering | Adversaries modify model artifacts so running model differs from vetted version | | Denial of ML Service | Flooding APIs or "sponge" inputs exhaust compute and knock model offline | | Model Reverse Engineering | Harvesting input-output pairs to clone or distil the model | | Insecure Integrated Component | Vulnerable plugins/agents let attackers inject code or escalate privileges | | Prompt Injection | Crafting prompts to override system intent and perform unintended commands | | Model Evasion | Designed inputs trigger mis-classification, hallucination, or disallowed content | | Sensitive Data Disclosure | Model reveals private/confidential information from training or user context | | Inferred Sensitive Data | Model deduces personal attributes never provided, creating privacy harms | | Insecure Model Output | Unsanitized responses pass harmful code, misinformation, or inappropriate content | | Rogue Actions | Autonomous agents execute unintended real-world operations without oversight |
The MITRE ATLAS Matrix provides a comprehensive framework for understanding AI attack techniques and tactics. It covers:
Reference: https://atlas.mitre.org/matrices/ATLAS
What it is: Attackers steal active session tokens or cloud API credentials and invoke paid, cloud-hosted LLMs without authorization. Access is resold via reverse proxies.
Consequences:
TTPs (Tactics, Techniques, Procedures):
Mitigations:
Determine what kind of AI system you're assessing:
Use the appropriate framework(s) based on system type:
| System Type | Primary Framework | Secondary Frameworks | |-------------|-------------------|---------------------| | ML Model | OWASP Top 10 ML | Google SAIF | | LLM/GenAI | Google SAIF | OWASP Top 10 ML | | AI Pipeline | MITRE ATLAS | OWASP Top 10 ML, Google SAIF | | AI Agent | Google SAIF | MITRE ATLAS | | Cloud LLM Access | LLMJacking patterns | Google SAIF |
For each relevant risk category:
Use this structure for risk documentation:
## Risk: [Risk Name]
**Framework:** [OWASP/SAIF/ATLAS/LLMJacking]
**Description:** [What the risk is]
**Applicability:** [Why it applies to this system]
**Likelihood:** [Low/Medium/High]
**Impact:** [Low/Medium/High]
**Evidence:** [Specific observations, test results, or analysis]
**Mitigation:** [Recommended controls]
**Status:** [Open/Mitigated/Accepted]
Use this checklist for rapid risk identification:
After completing your assessment:
testing
How to perform a House of Lore (small bin attack) heap exploitation. Use this skill whenever the user mentions heap exploitation, small bin attacks, fake chunks, glibc heap vulnerabilities, or needs to insert fake chunks into small bins for arbitrary read/write. Trigger for CTF challenges involving heap corruption, glibc 2.31+ exploitation, or when the user needs to bypass malloc sanity checks using fake chunk linking.
testing
How to perform House of Force heap exploitation attacks. Use this skill whenever the user mentions heap exploitation, House of Force, top chunk manipulation, arbitrary memory allocation, malloc manipulation, or wants to allocate chunks at specific addresses. Also trigger for CTF challenges involving heap overflows, top chunk size overwrites, or when the user needs to calculate evil_size for heap attacks. Make sure to use this skill for any binary exploitation task involving glibc heap manipulation, even if they don't explicitly say "House of Force".
tools
How to perform House of Einherjar heap exploitation to allocate memory at arbitrary addresses. Use this skill whenever the user mentions heap exploitation, glibc heap attacks, arbitrary memory allocation, off-by-one overflow exploitation, tcache poisoning, fast bin attacks, or any CTF challenge involving heap manipulation. This is essential for binary exploitation tasks where you need to control malloc() return addresses.
testing
How to identify, analyze, and exploit heap overflow vulnerabilities in binary exploitation challenges and real-world scenarios. Use this skill whenever the user mentions heap overflows, memory corruption, heap grooming, tcache poisoning, fast-bin attacks, or any heap-related vulnerability in CTF challenges, binary analysis, or security research. This skill covers heap overflow fundamentals, exploitation techniques, heap grooming strategies, and real-world CVE analysis.