plugins/trailmark/skills/graph-evolution/SKILL.md
Compares Trailmark code graphs at two source code snapshots (git commits, tags, or directories) to surface security-relevant structural changes. Detects new attack paths, complexity shifts, blast radius growth, taint propagation changes, and privilege boundary modifications that text diffs miss. Use when comparing code between commits or tags, analyzing structural evolution, detecting attack surface growth, reviewing what changed between audit snapshots, or finding security-relevant changes that text diffs miss.
npx skillsauth add trailofbits/skills graph-evolutionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Builds Trailmark code graphs at two source snapshots and computes a structural diff. Surfaces security-relevant changes that text-level diffs miss: new attack paths, complexity shifts, blast radius growth, taint propagation changes, and privilege boundary modifications.
differential-review for text-diff analysis)trailmark skill directly)diagramming-code skill)genotoxic skill)| Rationalization | Why It's Wrong | Required Action |
|-----------------|----------------|-----------------|
| "We just need the structural diff, skip pre-analysis" | Without pre-analysis, you miss taint changes, blast radius growth, and privilege boundary shifts | Run engine.preanalysis() on both snapshots |
| "Text diff covers what changed" | Text diffs miss new attack paths, transitive complexity shifts, and subgraph membership changes | Use structural diff to complement text diff |
| "Only added nodes matter" | Removed security functions and shifted privilege boundaries are equally dangerous | Review removals and modifications, not just additions |
| "Low-severity structural changes can be ignored" | INFO-level changes (dead code removal) can mask removed security checks | Classify every change, review removals for replaced functionality |
| "One snapshot's graph is enough for comparison" | Single-snapshot analysis can't detect evolution — you need both before and after | Always build and export both graphs |
| "Tool isn't installed, I'll compare manually" | Manual comparison misses what graph analysis catches | Install trailmark first |
trailmark must be installed. If uv run trailmark fails, run:
uv pip install trailmark
DO NOT fall back to "manual comparison" or reading source files as a substitute for running trailmark. The tool must be installed and used programmatically. If installation fails, report the error.
# Compare two git refs (e.g., tags, branches, commits)
# 1. Build graphs at each snapshot
# 2. Run pre-analysis on both
# 3. Compute structural diff
# 4. Generate report
# Step-by-step: see Workflow below
├─ Need to understand what each metric means?
│ └─ Read: references/evolution-metrics.md
│
├─ Need the report output format?
│ └─ Read: references/report-format.md
│
├─ Already have two graph JSON exports?
│ └─ Jump to Phase 3 (run native diff + graph_diff.py)
│
└─ Starting from two git refs?
└─ Start at Phase 1
Graph Evolution Progress:
- [ ] Phase 1: Create snapshots (git worktrees)
- [ ] Phase 2: Build graphs + pre-analysis on both snapshots
- [ ] Phase 3: Compute structural diff
- [ ] Phase 4: Interpret diff and generate report
- [ ] Phase 5: Clean up worktrees
Use git worktrees to get clean copies of each ref without disturbing the working tree.
# Create temp directories for worktrees
BEFORE_DIR=$(mktemp -d)
AFTER_DIR=$(mktemp -d)
# Create worktrees (run from repo root)
git worktree add "$BEFORE_DIR" {before_ref}
git worktree add "$AFTER_DIR" {after_ref}
If comparing two directories instead of git refs, skip this phase and use the directory paths directly in Phase 2.
Build Trailmark graphs for both snapshots and run pre-analysis on each. Pre-analysis computes blast radius, taint propagation, privilege boundaries, and entrypoint enumeration.
from trailmark.query.api import QueryEngine
def build_and_export(target_dir, output_path, language="auto"):
"""Build graph, run pre-analysis, export JSON."""
engine = QueryEngine.from_directory(target_dir, language=language)
engine.preanalysis()
json_str = engine.to_json()
with open(output_path, "w") as f:
f.write(json_str)
return engine.summary()
import tempfile, os
work_dir = tempfile.mkdtemp(prefix="trailmark_evolution_")
before_json = os.path.join(work_dir, "before_graph.json")
after_json = os.path.join(work_dir, "after_graph.json")
before_summary = build_and_export(
"{before_dir}", before_json
)
after_summary = build_and_export(
"{after_dir}", after_json
)
Verify both graphs built successfully by checking the summary output.
If either fails, rerun with an explicit language or comma-separated list
instead of auto.
Run both:
graph_diff.py helper for subgraph membership changesUsing the same work_dir from Phase 2:
trailmark diff --json "{before_dir}" "{after_dir}" > "{work_dir}/trailmark_diff.json" || \
uv run trailmark diff --json "{before_dir}" "{after_dir}" > "{work_dir}/trailmark_diff.json"
uv run {baseDir}/scripts/graph_diff.py \
--before "{before_json}" \
--after "{after_json}" > "{work_dir}/subgraph_diff.json"
If either diff command fails or writes an empty JSON file, stop and report the error instead of continuing to Phase 4.
The native Trailmark diff contains:
| Key | Contents |
|-----|----------|
| summary_delta | Changes in node/edge/entrypoint counts |
| nodes.added | New functions, classes, methods |
| nodes.removed | Deleted functions, classes, methods |
| nodes.modified | Functions with changed CC, params, line span |
| edges.added | New call/inheritance/import relationships |
| edges.removed | Deleted relationships |
| entrypoints | Added, removed, and modified entrypoints |
The subgraph diff contains:
| Key | Contents |
|-----|----------|
| subgraphs | Per-subgraph membership changes (tainted, high_blast_radius, etc.) |
Read both diff JSON files and generate a security-focused markdown report. See references/report-format.md for the full template.
Interpretation priorities (highest to lowest):
tainted subgraph,
especially if they also appear in added edges targeting sensitive
functionsuntrusted_external, from trailmark_diff.jsonhigh_blast_radiusCross-reference structural changes with git diff {before_ref}..{after_ref}
to add source-level context to findings.
Severity classification:
| Severity | Structural Signal | |----------|------------------| | CRITICAL | New tainted path to sensitive function, removed auth boundary | | HIGH | New entrypoint + high blast radius, large CC increase on tainted node | | MEDIUM | New trust-boundary-crossing edges, moderate CC increase | | LOW | Added nodes without entrypoint reachability | | INFO | Dead code removal, complexity reductions |
For detailed metric definitions, see references/evolution-metrics.md.
Remove git worktrees after the report is written:
git worktree remove "{before_dir}"
git worktree remove "{after_dir}"
trailmark diff --json BEFORE AFTER
uv run {baseDir}/scripts/graph_diff.py [OPTIONS]
Use trailmark diff for:
Use graph_diff.py for:
engine.preanalysis()tainted, high_blast_radius, privilege_boundary, and related sets| Argument | Default | Description |
|----------|---------|-------------|
| --before | required | Path to the "before" graph JSON |
| --after | required | Path to the "after" graph JSON |
| --indent | 2 | JSON output indentation |
graph_diff.py input format: Trailmark JSON exports from engine.to_json().
graph_diff.py output: JSON structural diff for nodes, edges, and subgraphs.
Before delivering the report:
trailmark_diff.json)subgraph_diff.json)GRAPH_EVOLUTION_*.mdtrailmark skill: Phase 2 uses the trailmark API for graph building and pre-analysis. All trailmark query patterns work on either snapshot's engine.
differential-review skill: Use graph-evolution for structural analysis, differential-review for line-level code review. The two are complementary — graph-evolution finds attack paths that text diffs miss, while differential-review provides git blame context and micro-adversarial analysis.
genotoxic skill: If graph-evolution reveals new high-CC tainted nodes, feed them to genotoxic for mutation testing triage.
diagramming-code skill:
Generate before/after diagrams to visualize structural changes.
Use call-graph or data-flow diagrams focused on changed nodes.
tools
Enforces authenticated gh CLI workflows over unauthenticated curl/WebFetch patterns. Use when working with GitHub URLs, API access, pull requests, or issues.
tools
Diagnose and fix Claude in Chrome MCP extension connectivity issues. Use when mcp__claude-in-chrome__* tools fail, return "Browser extension is not connected", or behave erratically.
development
Performs comprehensive C/C++ security review for memory corruption, integer overflows, race conditions, and platform-specific vulnerabilities. Use when auditing native C/C++ applications, reviewing daemons or services for memory safety, or hunting integer overflow / use-after-free / race conditions in userspace code.
development
Detects missing zeroization of sensitive data in source code and identifies zeroization removed by compiler optimizations, with assembly-level analysis, and control-flow verification. Use for auditing C/C++/Rust code handling secrets, keys, passwords, or other sensitive data.