DAG-Structured Vulnerability Reasoning

This skill enables Claude to perform rigorous vulnerability analysis using Directed Acyclic Graph (DAG) structured reasoning, based on the DAGVul framework. Instead of producing linear chain-of-thought explanations that often hallucinate plausible-sounding but incorrect logic, this approach models vulnerability reasoning as a graph of causal dependencies: extracting ground facts from code as source nodes, building intermediate inference nodes through taint tracking and control/data flow analysis, and converging on terminal sink nodes that confirm or refute the vulnerability. This enforces structural consistency and eliminates the 12 systematic failure patterns (hallucination, spurious causality, incomplete evidence, etc.) that plague standard LLM vulnerability analysis.

When to Use

When a user asks to analyze code for security vulnerabilities and wants to understand the root cause, not just a yes/no verdict
When reviewing a function or module for memory safety issues (use-after-free, buffer overflow, null pointer dereference)
When tracing taint propagation from user input through to a dangerous sink (SQL injection, command injection, XSS)
When a user provides a CVE or CWE and asks whether their code is affected and why
When comparing a vulnerable code snippet against its patched version to verify the fix is correct
When a user asks "is this code safe?" and needs a structured, auditable explanation rather than a superficial pattern match
When performing code review where the reasoning behind the security assessment matters as much as the verdict

Key Technique

The problem with linear reasoning: Research shows that 36.4% of correct vulnerability detection verdicts from LLMs are based on incorrect reasoning — the model gets the right answer for the wrong reasons. Linear chain-of-thought (CoT) reasoning is prone to 12 systematic failure patterns grouped into four categories: (1) focus identification errors (analyzing the wrong code region), (2) code comprehension failures (misunderstanding data flow, control flow, intra-procedural or inter-procedural semantics), (3) logic analysis failures (incomplete evidence, spurious causality, flawed premises, contradictions), and (4) generative biases (hallucination, over-inference, redundancy). These errors compound in sequential reasoning because there is no structural constraint preventing a later step from contradicting or disconnecting from earlier ones.

DAG-structured reasoning fixes this. Instead of a linear chain, model the analysis as a graph G = (V, E) with three node types: Source nodes (ground facts extracted directly from code — buffer sizes, API signatures, variable types, entry points), Intermediate nodes (logical inferences such as taint propagation steps, pointer analysis, constraint solving — each requiring explicit citation of parent nodes), and Sink nodes (terminals that either confirm vulnerability as a verified_sink or prove safety as a sanitized_sink). Edges encode causal dependencies: a node is only admissible when ALL its parent dependencies are established. This topological ordering prevents circular reasoning, ensures every claim is grounded in code evidence, and guarantees logical closure — every reasoning path must terminate at a defined sink.

Enforcing correctness through structure: The DAG constraint means you cannot assert "this buffer overflows" without first establishing the buffer's allocation size (source node), the write operation and its bounds (intermediate nodes with data flow edges), and the absence of bounds checking (intermediate node). If any link in the causal chain is missing, the graph is structurally incomplete and the conclusion is not supported. This mirrors how human security experts reason — building a chain of evidence from code facts to vulnerability confirmation.

Step-by-Step Workflow

Identify the analysis scope. Read the code and determine the specific function, module, or code path under review. Identify the CWE category if known (e.g., CWE-416 Use-After-Free, CWE-787 Out-of-bounds Write, CWE-89 SQL Injection). Narrow focus to the relevant code region rather than analyzing everything at once.
Extract source nodes (ground facts). Enumerate concrete, verifiable facts directly from the code with line references: buffer allocations and their sizes, API calls and their signatures, variable declarations and types, input sources (user input, file reads, network data), security-relevant constants, and memory management operations (malloc/free, new/delete).
Build intermediate inference nodes with explicit parent dependencies. For each inference, state what parent nodes it depends on and what program analysis primitive justifies it:
- Taint propagation: Track how untrusted data flows through assignments, function parameters, and return values. Cite the source node where taint originates and each intermediate transformation.
- Data flow analysis: Follow how values are defined, used, and modified. Map def-use chains explicitly.
- Control flow analysis: Identify branches, loops, and error handling paths that affect whether a vulnerable operation is reachable.
- Constraint solving: Determine what conditions must hold for the vulnerable path to execute.
Check each intermediate node for the 12 failure patterns. Before accepting an inference node, verify:
- It does not misrepresent data flow or control flow (code comprehension check)
- It does not assume facts not present in the code (hallucination check)
- It does not draw conclusions from incomplete evidence (incomplete evidence check)
- It does not assert causality without a mechanistic link (spurious causality check)
- It does not contradict any established parent node (contradiction check)
Enforce topological ordering. Verify that every intermediate node's parent dependencies are satisfied before that node is asserted. If node N depends on nodes A and B, both A and B must be fully established first. No forward references or circular dependencies.
Converge on sink nodes. Arrive at one of two terminal conclusions:
- Verified sink: The vulnerability is confirmed — state the exact triggering condition, the vulnerable operation with line reference, and the complete causal chain from source to sink.
- Sanitized sink: The code is safe — identify the specific sanitization, bounds check, or control flow guard that breaks the exploit chain, with line reference.
Validate logical closure. Confirm that every source node connects to at least one sink node through a complete path. If any source node is a dead-end (no path to a sink), the analysis is incomplete — either extend the reasoning or explicitly note the gap.
Produce the structured output. Present the DAG as a readable vulnerability report with: (a) the verdict, (b) the causal chain formatted as node dependencies, (c) line references for every claim, and (d) the CWE classification if applicable.

Concrete Examples

Example 1: Use-After-Free (CWE-416)

User: "Is this C code vulnerable?"

void process_data(int *buf, int len) {
    for (int i = 0; i < len; i++)
        buf[i] = i * 2;
    free(buf);
    // ... other operations ...
    printf("%d\n", buf[0]);  // line 6
}

Approach (DAG reasoning):

Source Nodes:
  S1: [line 1] buf is a pointer parameter (int*), caller-owned memory
  S2: [line 4] free(buf) deallocates the memory pointed to by buf
  S3: [line 6] buf[0] dereferences buf after the free

Intermediate Nodes:
  I1: [depends on S1, S2] After free(buf) on line 4, the pointer buf
      becomes a dangling pointer — it still holds the address but the
      memory is no longer valid. (Memory lifecycle analysis)
  I2: [depends on I1] No reassignment of buf between line 4 and line 6.
      No conditional guard or null check. buf remains dangling.
      (Data flow analysis: no re-definition of buf in [4..6])
  I3: [depends on I2, S3] The dereference buf[0] on line 6 accesses
      freed memory through the dangling pointer. (Taint: dangling
      pointer reaches dereference operation)

Sink Node:
  VERIFIED_SINK: [depends on I3] Use-After-Free confirmed.
    Trigger: free(buf) at line 4 followed by buf[0] read at line 6
    with no intervening reallocation or guard.

Verdict: VULNERABLE — CWE-416 Use-After-Free
Root cause: Missing nullification of buf after free(), and no
control flow preventing the post-free access on line 6.

Example 2: SQL Injection Analysis — Safe Code (CWE-89)

User: "Can this Python code be SQL injected?"

def get_user(db, username):
    query = "SELECT * FROM users WHERE name = ?"
    cursor = db.execute(query, (username,))
    return cursor.fetchone()

Approach (DAG reasoning):

Source Nodes:
  S1: [line 1] username is an external parameter (untrusted input)
  S2: [line 2] query uses a parameterized placeholder "?"
  S3: [line 3] db.execute() is called with query and (username,) as
      separate arguments — parameterized execution

Intermediate Nodes:
  I1: [depends on S1, S2] The untrusted input username is NOT
      concatenated or interpolated into the query string. The query
      string is a static literal. (Data flow: no def-use chain from
      username to query string construction)
  I2: [depends on S2, S3] The database driver receives the query
      template and parameters separately. The "?" placeholder is
      bound by the driver's parameterization engine, which escapes
      the value before substitution. (API contract analysis)

Sink Node:
  SANITIZED_SINK: [depends on I1, I2] SQL injection is prevented.
    The parameterized query separates code from data. The untrusted
    input never enters the query as executable SQL.

Verdict: SAFE — Parameterized query prevents CWE-89 SQL Injection.

Example 3: Buffer Overflow with Subtle Control Flow (CWE-787)

User: "Review this for buffer overflow."

void copy_input(char *src) {
    char dest[64];
    int len = strlen(src);
    if (len < 128) {
        memcpy(dest, src, len);
    }
}

Approach (DAG reasoning):

Source Nodes:
  S1: [line 1] src is an external char* parameter (untrusted)
  S2: [line 2] dest is a stack buffer of 64 bytes
  S3: [line 3] len = strlen(src), which can be 0..SIZE_MAX-1
  S4: [line 4] Guard condition: len < 128
  S5: [line 5] memcpy(dest, src, len) copies len bytes into dest

Intermediate Nodes:
  I1: [depends on S3, S4] The guard permits len in range [0..127].
      (Constraint solving: len < 128 passes for any len <= 127)
  I2: [depends on S2, I1] dest has capacity 64 bytes, but len can
      be up to 127. Values in [65..127] exceed dest's capacity.
      (Bounds analysis: 127 > 64)
  I3: [depends on I2, S5] memcpy writes len bytes (up to 127) into
      a 64-byte buffer. For len in [65..127], this overflows dest
      by up to 63 bytes on the stack. (Overflow confirmation)

Sink Node:
  VERIFIED_SINK: [depends on I3] Stack buffer overflow confirmed.
    Trigger: src with strlen in [65..127] passes the guard but
    overflows the 64-byte dest buffer.
    The guard checks against 128 but should check against
    sizeof(dest) which is 64.

Verdict: VULNERABLE — CWE-787 Out-of-bounds Write
Root cause: Bounds check (len < 128) is mismatched with actual
buffer size (64 bytes). Fix: change guard to (len < sizeof(dest))
or (len < 64).

Best Practices

Do: Always extract source nodes with exact line references before making any inferences. Every claim must trace back to a specific line of code.
Do: Explicitly state parent dependencies for every intermediate node. Write "[depends on S1, I2]" to make the causal chain auditable.
Do: Consider both the vulnerable path AND potential sanitization. Check for bounds checks, null guards, input validation, and error handling that might break the exploit chain before declaring a vulnerability.
Do: Check for the most common LLM failure pattern — spurious causality — where a dangerous-looking API is flagged as vulnerable without tracing whether untrusted input actually reaches it.
Avoid: Asserting a vulnerability exists just because a dangerous function (strcpy, sprintf, eval) appears in the code. Trace the actual data flow from untrusted source to dangerous sink.
Avoid: Skipping inter-procedural analysis. If a function calls another function that performs validation, that validation must appear as a node in the DAG or the analysis is incomplete.
Avoid: Producing a verdict without a complete path from source nodes to a sink node. An incomplete graph means insufficient evidence.

Error Handling

Incomplete code context: If the user provides a code snippet without callers or callees, state what assumptions are being made (e.g., "Assuming input is untrusted user data") as explicit source nodes and flag them as assumptions rather than ground facts.
Ambiguous semantics: When language-specific behavior is unclear (e.g., integer promotion rules in C, prototype chain in JavaScript), cite the specific language standard behavior and note the ambiguity as a conditional branch in the DAG.
Multiple vulnerability paths: If the code has multiple independent vulnerability paths, construct separate DAGs for each and present them independently. Do not conflate unrelated issues into a single reasoning chain.
False positive detection: If the DAG analysis cannot produce a complete path from an untrusted source to a verified sink, explicitly conclude SAFE rather than hedging. An incomplete causal chain is not evidence of vulnerability.

Limitations

This approach is most effective for vulnerabilities with clear causal chains (memory corruption, injection, authentication bypass). It is less suited for timing side-channels, statistical information leakage, or vulnerabilities requiring dynamic runtime analysis.
Inter-procedural analysis depth is limited by available code context. If critical functions are defined in external libraries without source, the DAG will have assumption nodes that reduce certainty.
The technique analyzes code structure, not runtime state. Concurrency bugs (race conditions, TOCTOU) require reasoning about thread interleaving that the static DAG model handles less naturally.
Language-specific nuances (C undefined behavior, JavaScript type coercion, Python dynamic typing) may require domain expertise beyond the structural analysis.

Reference

Paper: "Evaluating and Enhancing the Vulnerability Reasoning Capabilities of Large Language Models" — Li Lu, Yanjie Zhao, Hongzhou Rao, Kechi Zhang, Haoyu Wang (2026). arXiv:2602.06687
Key takeaway: 36.4% of correct LLM vulnerability verdicts come from wrong reasoning. The DAGVul framework fixes this by modeling analysis as a directed acyclic graph with source/intermediate/sink nodes, enforcing that every inference cites parent dependencies and every reasoning path terminates at a defined conclusion. Look for: the 12 failure pattern taxonomy, the node admissibility rule (pa(v) must be established), and the logical closure requirement.

DAG-Structured Vulnerability Reasoning

When to Use

When a user asks to analyze code for security vulnerabilities and wants to understand the root cause, not just a yes/no verdict
When reviewing a function or module for memory safety issues (use-after-free, buffer overflow, null pointer dereference)
When tracing taint propagation from user input through to a dangerous sink (SQL injection, command injection, XSS)
When a user provides a CVE or CWE and asks whether their code is affected and why
When comparing a vulnerable code snippet against its patched version to verify the fix is correct
When a user asks "is this code safe?" and needs a structured, auditable explanation rather than a superficial pattern match
When performing code review where the reasoning behind the security assessment matters as much as the verdict

Key Technique

Step-by-Step Workflow

Identify the analysis scope. Read the code and determine the specific function, module, or code path under review. Identify the CWE category if known (e.g., CWE-416 Use-After-Free, CWE-787 Out-of-bounds Write, CWE-89 SQL Injection). Narrow focus to the relevant code region rather than analyzing everything at once.
Extract source nodes (ground facts). Enumerate concrete, verifiable facts directly from the code with line references: buffer allocations and their sizes, API calls and their signatures, variable declarations and types, input sources (user input, file reads, network data), security-relevant constants, and memory management operations (malloc/free, new/delete).
Build intermediate inference nodes with explicit parent dependencies. For each inference, state what parent nodes it depends on and what program analysis primitive justifies it:
- Taint propagation: Track how untrusted data flows through assignments, function parameters, and return values. Cite the source node where taint originates and each intermediate transformation.
- Data flow analysis: Follow how values are defined, used, and modified. Map def-use chains explicitly.
- Control flow analysis: Identify branches, loops, and error handling paths that affect whether a vulnerable operation is reachable.
- Constraint solving: Determine what conditions must hold for the vulnerable path to execute.
Check each intermediate node for the 12 failure patterns. Before accepting an inference node, verify:
- It does not misrepresent data flow or control flow (code comprehension check)
- It does not assume facts not present in the code (hallucination check)
- It does not draw conclusions from incomplete evidence (incomplete evidence check)
- It does not assert causality without a mechanistic link (spurious causality check)
- It does not contradict any established parent node (contradiction check)
Enforce topological ordering. Verify that every intermediate node's parent dependencies are satisfied before that node is asserted. If node N depends on nodes A and B, both A and B must be fully established first. No forward references or circular dependencies.
Converge on sink nodes. Arrive at one of two terminal conclusions:
- Verified sink: The vulnerability is confirmed — state the exact triggering condition, the vulnerable operation with line reference, and the complete causal chain from source to sink.
- Sanitized sink: The code is safe — identify the specific sanitization, bounds check, or control flow guard that breaks the exploit chain, with line reference.
Validate logical closure. Confirm that every source node connects to at least one sink node through a complete path. If any source node is a dead-end (no path to a sink), the analysis is incomplete — either extend the reasoning or explicitly note the gap.
Produce the structured output. Present the DAG as a readable vulnerability report with: (a) the verdict, (b) the causal chain formatted as node dependencies, (c) line references for every claim, and (d) the CWE classification if applicable.

Concrete Examples

Example 1: Use-After-Free (CWE-416)

User: "Is this C code vulnerable?"

void process_data(int *buf, int len) {
    for (int i = 0; i < len; i++)
        buf[i] = i * 2;
    free(buf);
    // ... other operations ...
    printf("%d\n", buf[0]);  // line 6
}

Approach (DAG reasoning):

Source Nodes:
  S1: [line 1] buf is a pointer parameter (int*), caller-owned memory
  S2: [line 4] free(buf) deallocates the memory pointed to by buf
  S3: [line 6] buf[0] dereferences buf after the free

Intermediate Nodes:
  I1: [depends on S1, S2] After free(buf) on line 4, the pointer buf
      becomes a dangling pointer — it still holds the address but the
      memory is no longer valid. (Memory lifecycle analysis)
  I2: [depends on I1] No reassignment of buf between line 4 and line 6.
      No conditional guard or null check. buf remains dangling.
      (Data flow analysis: no re-definition of buf in [4..6])
  I3: [depends on I2, S3] The dereference buf[0] on line 6 accesses
      freed memory through the dangling pointer. (Taint: dangling
      pointer reaches dereference operation)

Sink Node:
  VERIFIED_SINK: [depends on I3] Use-After-Free confirmed.
    Trigger: free(buf) at line 4 followed by buf[0] read at line 6
    with no intervening reallocation or guard.

Verdict: VULNERABLE — CWE-416 Use-After-Free
Root cause: Missing nullification of buf after free(), and no
control flow preventing the post-free access on line 6.

Example 2: SQL Injection Analysis — Safe Code (CWE-89)

User: "Can this Python code be SQL injected?"

def get_user(db, username):
    query = "SELECT * FROM users WHERE name = ?"
    cursor = db.execute(query, (username,))
    return cursor.fetchone()

Approach (DAG reasoning):

Source Nodes:
  S1: [line 1] username is an external parameter (untrusted input)
  S2: [line 2] query uses a parameterized placeholder "?"
  S3: [line 3] db.execute() is called with query and (username,) as
      separate arguments — parameterized execution

Intermediate Nodes:
  I1: [depends on S1, S2] The untrusted input username is NOT
      concatenated or interpolated into the query string. The query
      string is a static literal. (Data flow: no def-use chain from
      username to query string construction)
  I2: [depends on S2, S3] The database driver receives the query
      template and parameters separately. The "?" placeholder is
      bound by the driver's parameterization engine, which escapes
      the value before substitution. (API contract analysis)

Sink Node:
  SANITIZED_SINK: [depends on I1, I2] SQL injection is prevented.
    The parameterized query separates code from data. The untrusted
    input never enters the query as executable SQL.

Verdict: SAFE — Parameterized query prevents CWE-89 SQL Injection.

Example 3: Buffer Overflow with Subtle Control Flow (CWE-787)

User: "Review this for buffer overflow."

void copy_input(char *src) {
    char dest[64];
    int len = strlen(src);
    if (len < 128) {
        memcpy(dest, src, len);
    }
}

Approach (DAG reasoning):

Source Nodes:
  S1: [line 1] src is an external char* parameter (untrusted)
  S2: [line 2] dest is a stack buffer of 64 bytes
  S3: [line 3] len = strlen(src), which can be 0..SIZE_MAX-1
  S4: [line 4] Guard condition: len < 128
  S5: [line 5] memcpy(dest, src, len) copies len bytes into dest

Intermediate Nodes:
  I1: [depends on S3, S4] The guard permits len in range [0..127].
      (Constraint solving: len < 128 passes for any len <= 127)
  I2: [depends on S2, I1] dest has capacity 64 bytes, but len can
      be up to 127. Values in [65..127] exceed dest's capacity.
      (Bounds analysis: 127 > 64)
  I3: [depends on I2, S5] memcpy writes len bytes (up to 127) into
      a 64-byte buffer. For len in [65..127], this overflows dest
      by up to 63 bytes on the stack. (Overflow confirmation)

Sink Node:
  VERIFIED_SINK: [depends on I3] Stack buffer overflow confirmed.
    Trigger: src with strlen in [65..127] passes the guard but
    overflows the 64-byte dest buffer.
    The guard checks against 128 but should check against
    sizeof(dest) which is 64.

Verdict: VULNERABLE — CWE-787 Out-of-bounds Write
Root cause: Bounds check (len < 128) is mismatched with actual
buffer size (64 bytes). Fix: change guard to (len < sizeof(dest))
or (len < 64).

Best Practices

Do: Always extract source nodes with exact line references before making any inferences. Every claim must trace back to a specific line of code.
Do: Explicitly state parent dependencies for every intermediate node. Write "[depends on S1, I2]" to make the causal chain auditable.
Do: Consider both the vulnerable path AND potential sanitization. Check for bounds checks, null guards, input validation, and error handling that might break the exploit chain before declaring a vulnerability.
Do: Check for the most common LLM failure pattern — spurious causality — where a dangerous-looking API is flagged as vulnerable without tracing whether untrusted input actually reaches it.
Avoid: Asserting a vulnerability exists just because a dangerous function (strcpy, sprintf, eval) appears in the code. Trace the actual data flow from untrusted source to dangerous sink.
Avoid: Skipping inter-procedural analysis. If a function calls another function that performs validation, that validation must appear as a node in the DAG or the analysis is incomplete.
Avoid: Producing a verdict without a complete path from source nodes to a sink node. An incomplete graph means insufficient evidence.

Error Handling

Incomplete code context: If the user provides a code snippet without callers or callees, state what assumptions are being made (e.g., "Assuming input is untrusted user data") as explicit source nodes and flag them as assumptions rather than ground facts.
Ambiguous semantics: When language-specific behavior is unclear (e.g., integer promotion rules in C, prototype chain in JavaScript), cite the specific language standard behavior and note the ambiguity as a conditional branch in the DAG.
Multiple vulnerability paths: If the code has multiple independent vulnerability paths, construct separate DAGs for each and present them independently. Do not conflate unrelated issues into a single reasoning chain.
False positive detection: If the DAG analysis cannot produce a complete path from an untrusted source to a verified sink, explicitly conclude SAFE rather than hedging. An incomplete causal chain is not evidence of vulnerability.

Limitations

This approach is most effective for vulnerabilities with clear causal chains (memory corruption, injection, authentication bypass). It is less suited for timing side-channels, statistical information leakage, or vulnerabilities requiring dynamic runtime analysis.
Inter-procedural analysis depth is limited by available code context. If critical functions are defined in external libraries without source, the DAG will have assumption nodes that reduce certainty.
The technique analyzes code structure, not runtime state. Concurrency bugs (race conditions, TOCTOU) require reasoning about thread interleaving that the static DAG model handles less naturally.
Language-specific nuances (C undefined behavior, JavaScript type coercion, Python dynamic typing) may require domain expertise beyond the structural analysis.

Reference

Paper: "Evaluating and Enhancing the Vulnerability Reasoning Capabilities of Large Language Models" — Li Lu, Yanjie Zhao, Hongzhou Rao, Kechi Zhang, Haoyu Wang (2026). arXiv:2602.06687
Key takeaway: 36.4% of correct LLM vulnerability verdicts come from wrong reasoning. The DAGVul framework fixes this by modeling analysis as a directed acyclic graph with source/intermediate/sink nodes, enforcing that every inference cites parent dependencies and every reasoning path terminates at a defined conclusion. Look for: the 12 failure pattern taxonomy, the node admissibility rule (pa(v) must be established), and the logical closure requirement.

Adoption

ndpvt-web/evaluating-enhancing-vulnerability-reasoning

$ install --global

Security Scan Results

SKILL.md

DAG-Structured Vulnerability Reasoning

When to Use

Key Technique

Step-by-Step Workflow

Concrete Examples

Best Practices

Error Handling

Limitations

Reference

Related Skills

ndpvt-web/gradingattack-attacking-short-answer

ndpvt-web/gisa-benchmark-general-information-seeking

ndpvt-web/gflowpo-generative-flow-network

ndpvt-web/generative-ontology-structured-knowledge

ndpvt-web/evaluating-enhancing-vulnerability-reasoning

$ install --global

Security Scan Results

SKILL.md

DAG-Structured Vulnerability Reasoning

When to Use

Key Technique

Step-by-Step Workflow

Concrete Examples

Best Practices

Error Handling

Limitations

Reference

Related Skills

ndpvt-web/gradingattack-attacking-short-answer

ndpvt-web/gisa-benchmark-general-information-seeking

ndpvt-web/gflowpo-generative-flow-network

ndpvt-web/generative-ontology-structured-knowledge