security/ai/ai-security-hardening/SKILL.md
Harden AI/LLM deployments against prompt injection, data exfiltration, model theft, and supply chain attacks. Covers input validation, output filtering, access control, model API security, and compliance controls for production AI systems.
npx skillsauth add bagelhole/devops-security-agent-skills ai-security-hardeningInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Secure LLM and AI systems against prompt injection, jailbreaks, data leakage, and supply chain threats in production environments.
Use this skill when:
Threat Risk Control
─────────────────────────────────────────────────────────────────────
Prompt injection System prompt override Input sanitization, separate context
Data exfiltration PII in model outputs Output filtering, DLP scanning
Jailbreaking Policy bypass Content moderation, guardrails
Model theft Weight extraction via API Rate limiting, access controls
Training data poisoning Backdoored fine-tuned model Dataset validation, provenance
Supply chain attack Malicious model weights Signature verification, scanning
Insecure output XSS/SQLi from LLM response Output encoding, parameterized queries
import re
from typing import Optional
INJECTION_PATTERNS = [
r"ignore\s+(all\s+)?(previous|prior|above)\s+instructions",
r"you\s+are\s+now\s+",
r"new\s+instructions?:",
r"system\s+prompt",
r"forget\s+everything",
r"act\s+as\s+",
r"jailbreak",
r"dan\s+mode",
r"<\s*system\s*>",
r"\[INST\]",
]
def detect_prompt_injection(user_input: str) -> tuple[bool, Optional[str]]:
"""Return (is_suspicious, matched_pattern)."""
normalized = user_input.lower().strip()
for pattern in INJECTION_PATTERNS:
if re.search(pattern, normalized, re.IGNORECASE):
return True, pattern
return False, None
def sanitize_user_input(user_input: str, max_length: int = 4000) -> str:
"""Sanitize input before passing to LLM."""
# Truncate
user_input = user_input[:max_length]
# Remove null bytes and control characters
user_input = re.sub(r'[\x00-\x08\x0b\x0c\x0e-\x1f\x7f]', '', user_input)
# Check for injection
suspicious, pattern = detect_prompt_injection(user_input)
if suspicious:
raise ValueError(f"Potential prompt injection detected: {pattern}")
return user_input
# guardrails.yaml
from nemoguardrails import RailsConfig, LLMRails
config = RailsConfig.from_path("./guardrails-config")
rails = LLMRails(config)
async def safe_llm_call(user_message: str) -> str:
response = await rails.generate_async(
messages=[{"role": "user", "content": user_message}]
)
return response["content"]
# guardrails-config/config.yml
models:
- type: main
engine: openai
model: gpt-4o-mini
rails:
input:
flows:
- check jailbreak
- check sensitive data
output:
flows:
- check output for PII
- check output for harmful content
import re
from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine
analyzer = AnalyzerEngine()
anonymizer = AnonymizerEngine()
PII_ENTITIES = ["PERSON", "EMAIL_ADDRESS", "PHONE_NUMBER", "CREDIT_CARD",
"US_SSN", "IBAN_CODE", "IP_ADDRESS", "LOCATION"]
def scrub_pii_from_output(text: str) -> str:
"""Remove PII from LLM output before returning to user."""
results = analyzer.analyze(text=text, entities=PII_ENTITIES, language="en")
if not results:
return text
anonymized = anonymizer.anonymize(text=text, analyzer_results=results)
return anonymized.text
def validate_output_safety(output: str) -> bool:
"""Check output doesn't contain prompt injection artifacts."""
dangerous_patterns = [
r"<\s*script\s*>", # XSS
r"javascript:", # XSS
r";\s*(DROP|DELETE|INSERT)",# SQLi
r"\$\{.*\}", # template injection
r"`.*`", # command injection in some contexts
]
for pattern in dangerous_patterns:
if re.search(pattern, output, re.IGNORECASE):
return False
return True
from fastapi import FastAPI, HTTPException, Depends, Request
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
import jwt
import time
from collections import defaultdict
app = FastAPI()
security = HTTPBearer()
# Rate limiting (per API key)
request_counts = defaultdict(list)
def rate_limit(api_key: str, max_requests: int = 100, window_seconds: int = 60):
now = time.time()
requests = request_counts[api_key]
# Remove old requests outside window
request_counts[api_key] = [t for t in requests if now - t < window_seconds]
if len(request_counts[api_key]) >= max_requests:
raise HTTPException(status_code=429, detail="Rate limit exceeded")
request_counts[api_key].append(now)
async def verify_token(
credentials: HTTPAuthorizationCredentials = Depends(security)
) -> dict:
try:
payload = jwt.decode(credentials.credentials, SECRET_KEY, algorithms=["HS256"])
rate_limit(payload["sub"])
return payload
except jwt.ExpiredSignatureError:
raise HTTPException(status_code=401, detail="Token expired")
except jwt.InvalidTokenError:
raise HTTPException(status_code=401, detail="Invalid token")
@app.post("/v1/chat/completions")
async def chat(request: Request, token: dict = Depends(verify_token)):
body = await request.json()
# Input validation
user_msg = body.get("messages", [{}])[-1].get("content", "")
try:
safe_input = sanitize_user_input(user_msg)
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e))
# Call LLM and scrub output
response = await call_llm(safe_input, token["scope"])
response["choices"][0]["message"]["content"] = scrub_pii_from_output(
response["choices"][0]["message"]["content"]
)
return response
# Verify model weights with SHA-256 hash before loading
MODEL_DIR="./models/llama-3.1-8b"
EXPECTED_HASH="sha256:abc123..."
# Generate hash of downloaded model
actual_hash=$(find "$MODEL_DIR" -name "*.safetensors" | sort | xargs sha256sum | sha256sum)
echo "Model hash: $actual_hash"
# Compare (automate in CI/CD)
if [ "$actual_hash" != "$EXPECTED_HASH" ]; then
echo "ERROR: Model hash mismatch — possible tampering!"
exit 1
fi
# Scan model files for embedded malware (ModelScan)
pip install modelscan
modelscan scan -p "$MODEL_DIR"
# Kubernetes NetworkPolicy — isolate LLM API
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: llm-api-isolation
namespace: ai-services
spec:
podSelector:
matchLabels:
app: vllm
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: backend # only backend can call LLM
ports:
- protocol: TCP
port: 8000
egress:
- to:
- namespaceSelector:
matchLabels:
name: monitoring # metrics only
ports:
- protocol: TCP
port: 9090
# Block egress to internet — prevent data exfiltration
# (allow only internal cluster traffic)
import structlog
from datetime import datetime, timezone
audit_log = structlog.get_logger("ai.audit")
def log_llm_interaction(
user_id: str,
session_id: str,
model: str,
prompt_tokens: int,
completion_tokens: int,
was_filtered: bool,
injection_detected: bool,
):
audit_log.info(
"llm_interaction",
timestamp=datetime.now(timezone.utc).isoformat(),
user_id=user_id,
session_id=session_id,
model=model,
prompt_tokens=prompt_tokens,
completion_tokens=completion_tokens,
was_filtered=was_filtered,
injection_detected=injection_detected,
# DO NOT log prompt/completion content — PII risk
)
| Issue | Cause | Fix |
|-------|-------|-----|
| False positive injection blocks | Overly broad regex | Tune patterns; use ML-based classifier for high-traffic |
| PII in model outputs | Model trained on PII data | Add Presidio scrubbing to output layer |
| API key leakage | Keys in logs or responses | Mask keys in logging; use vault for key storage |
| Model weight tampering | Unverified downloads | Always verify SHA-256; use modelscan |
| Rate limit bypass | Per-IP not per-user | Rate limit on authenticated user ID, not IP |
modelscan on any model downloaded from the internet before serving.development
Design and operationalize SRE dashboards that surface reliability, latency, error, saturation, and capacity signals across services. Use when building observability views for SLOs, incident response, and executive reliability reporting.
testing
Harden OpenClaw self-hosted environments with baseline host controls, auth tightening, secret handling, network segmentation, and safe update/rollback workflows. Use when deploying OpenClaw in home labs, startups, or production-like local AI infrastructure.
devops
Deploy, manage, and optimize vector databases for AI applications. Covers Qdrant, Weaviate, pgvector, and Pinecone — collection management, indexing strategies, backup, and performance tuning for production RAG and semantic search workloads.
testing
Deploy ML models on Kubernetes with KServe (formerly KFServing) and NVIDIA Triton Inference Server. Includes canary deployments, autoscaling, model versioning, A/B testing, and GPU resource management for production model serving.