skills/skillxiv-v0.0.2-claude-opus-4.6/agent-skills-security-analysis/SKILL.md
Empirically analyzes 31,132 agent skills to identify 14 distinct vulnerability patterns, finding 26.1% contain security flaws including data exfiltration, privilege escalation, and malicious intent risks that require mandatory vetting.
npx skillsauth add ADu2021/skillXiv agent-skills-security-analysisInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Conduct security analysis of modular agent skills before deployment. Use multi-stage detection combining static analysis and LLM-based semantic classification to identify vulnerabilities such as data exfiltration, privilege escalation, prompt injection, and supply chain risks.
Implement multi-stage detection pipeline combining static and semantic analysis.
# Multi-stage detection framework
class SkillScan:
def __init__(self):
self.static_analyzer = StaticAnalyzer()
self.semantic_classifier = SemanticClassifier()
def scan_skill(self, skill_code, skill_metadata):
"""Comprehensive vulnerability detection"""
results = {
"static_findings": self.static_analyzer.analyze(skill_code),
"semantic_findings": self.semantic_classifier.classify(skill_code),
"metadata_issues": self.check_metadata(skill_metadata),
"vulnerability_patterns": []
}
# Consolidate findings
results["vulnerability_patterns"] = self.consolidate_findings(results)
results["risk_score"] = self.compute_risk_score(results)
return results
def consolidate_findings(self, findings):
"""Identify distinct vulnerability patterns"""
patterns = []
for finding in findings["static_findings"]:
pattern = self.classify_pattern(finding)
patterns.append(pattern)
return list(set(patterns))
Detect vulnerabilities through code pattern matching.
# Static analysis patterns
class StaticAnalyzer:
VULNERABILITY_PATTERNS = {
"credential_exposure": [
r"api[_]?key\s*=",
r"password\s*=",
r"token\s*="
],
"file_access_risk": [
r"open\s*\(",
r"read\s*file",
r"write\s*file",
r"os\.remove"
],
"network_calls": [
r"requests\.",
r"urllib",
r"socket\.",
r"send.*http"
],
"process_execution": [
r"subprocess\.",
r"os\.system",
r"popen"
]
}
def analyze(self, code):
"""Find suspicious patterns in code"""
findings = []
for pattern_type, patterns in self.VULNERABILITY_PATTERNS.items():
for pattern in patterns:
matches = re.findall(pattern, code, re.IGNORECASE)
if matches:
findings.append({
"type": pattern_type,
"pattern": pattern,
"match_count": len(matches),
"severity": self.estimate_severity(pattern_type)
})
return findings
def estimate_severity(self, pattern_type):
"""Assess severity of vulnerability pattern"""
severity_map = {
"credential_exposure": "critical",
"process_execution": "high",
"network_calls": "medium",
"file_access_risk": "medium"
}
return severity_map.get(pattern_type, "low")
Use language models to understand vulnerability intent.
# Semantic vulnerability classification
class SemanticClassifier:
def classify(self, code):
"""Identify vulnerability intent through semantic analysis"""
# Classify malicious intent categories
intent_categories = [
"data_exfiltration",
"privilege_escalation",
"prompt_injection",
"supply_chain_risk",
"benign_risky_pattern"
]
classifications = {}
for category in intent_categories:
prompt = f"""Analyze this code for {category} intent:
{code}
Is there evidence of {category}? (yes/no/unclear)
Confidence: 0-1
"""
result = self.llm_classify(prompt)
classifications[category] = {
"detected": result["answer"],
"confidence": result["confidence"]
}
return classifications
def llm_classify(self, prompt):
"""LLM-based semantic analysis"""
# Would use actual LLM API
pass
Organize vulnerabilities into actionable categories.
# Vulnerability categories with risk levels
class VulnerabilityCategory:
CATEGORIES = {
"data_exfiltration": {
"description": "Attempt to send data outside system",
"prevalence": 0.133, # 13.3% of vulnerable skills
"examples": ["send_logs", "collect_files", "api_exfil"]
},
"privilege_escalation": {
"description": "Attempt to gain higher permissions",
"prevalence": 0.118, # 11.8%
"examples": ["sudo_access", "admin_check", "permission_bypass"]
},
"malicious_intent": {
"description": "Clear evidence of deliberate harm",
"prevalence": 0.052, # 5.2%
"examples": ["backdoor", "ransomware_pattern", "botnet"]
},
"prompt_injection": {
"description": "Vulnerability to LLM prompt injection",
"prevalence": 0.045,
"examples": ["unescaped_input", "eval_user_input"]
},
"supply_chain": {
"description": "Risk to dependency management",
"prevalence": 0.035,
"examples": ["typosquatting", "dependency_confusion"]
}
}
@staticmethod
def get_risk_level(vulnerability):
"""Map vulnerability to risk level"""
if vulnerability in ["data_exfiltration", "privilege_escalation", "malicious_intent"]:
return "high"
elif vulnerability in ["prompt_injection"]:
return "medium"
else:
return "low"
Assign quantitative risk scores to skills.
# Risk scoring
class RiskScorer:
def compute_score(self, findings):
"""Compute overall risk score 0-1"""
if not findings:
return 0.0
# Weighted scoring
critical_count = len([f for f in findings if f.get("severity") == "critical"])
high_count = len([f for f in findings if f.get("severity") == "high"])
medium_count = len([f for f in findings if f.get("severity") == "medium"])
score = (
critical_count * 0.5 +
high_count * 0.3 +
medium_count * 0.1
) / len(findings)
return min(1.0, score)
def requires_vetting(self, risk_score):
"""Determine if skill requires manual review"""
return risk_score > 0.3
Identify that executable scripts increase vulnerability risk.
# Script bundling risk factor
class BundlingAnalysis:
def assess_bundling_risk(self, skill):
"""Check if skill includes executable scripts"""
has_scripts = any(
script_ext in skill["files"]
for script_ext in [".py", ".sh", ".js", ".exe"]
)
if has_scripts:
# Scripts are 2.12x more likely to contain vulnerabilities
return {"has_scripts": True, "risk_multiplier": 2.12}
else:
return {"has_scripts": False, "risk_multiplier": 1.0}
testing
Uses flow maps as look-ahead operators to enable principled reward-guided diffusion by predicting trajectory endpoints at any denoising step. Deploy when applying rewards or preferences to diffusion trajectories with meaningful gradients throughout generation.
testing
Train language models where each expert learns independently on closed datasets, enabling flexible inference with selective data inclusion or exclusion. 41% performance improvement while allowing users to opt out of specific data sources without retraining.
data-ai
Understand how token generation flexibility in diffusion LMs paradoxically constrains reasoning, as models exploit ordering flexibility to avoid uncertain tokens, and apply simplified approaches that preserve parallel decoding benefits. Use when optimizing diffusion-based language models for reasoning tasks.
devops
Enable LLM agents to improve continuously during deployment by constructing structured experience libraries through self-reflection on successes and failures—achieving 23% improvement on reasoning without gradient-based parameter updates or external training.