skills/beyond-function-level-analysis-context-aware/SKILL.md
Inter-procedural vulnerability detection using context-aware reasoning. Analyzes functions alongside their callers, callees, and global state to find vulnerabilities that single-function analysis misses. Uses code property graph traversal, security-focused context profiling, relevance scoring, and structured reasoning traces. Trigger phrases: - "Check this code for vulnerabilities across function boundaries" - "Analyze this function with its callers and callees for security issues" - "Find inter-procedural vulnerabilities in this codebase" - "Review this code for vulnerabilities that depend on how it's called" - "Do a deep security audit with cross-function context" - "Analyze whether this function is safe given how callers use it"
npx skillsauth add ndpvt-web/arxiv-claude-skills beyond-function-level-analysis-context-awareInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to detect vulnerabilities that are invisible at the function level by systematically gathering inter-procedural context — callers, callees, global variables — profiling each for security relevance, and applying structured reasoning over the combined evidence. Based on the CPRVul framework, this approach improved detection accuracy by 22.9% over function-only baselines on the PrimeVul benchmark. The core insight: raw context hurts; only profiled, scored, and reasoned-over context helps.
Why function-level analysis fails. Most vulnerability detectors examine one function in isolation and predict "vulnerable or not." But real vulnerabilities often depend on context: a function that trusts its input is only vulnerable if callers pass unsanitized data; a null-pointer dereference only matters if callers can trigger the null path; a lock release is only dangerous if the caller didn't check for null first. Naively appending all caller/callee code makes things worse — the noise overwhelms the signal, and fine-tuned models actually perform worse with raw context than without it.
Context Profiling and Selection. CPRVul first builds a code property graph (CPG) for the repository and extracts three categories of context for a target function: (1) callers — every function that invokes the target, (2) callees — every function the target invokes, and (3) global variables the target reads or writes. Each context element is then profiled through security-focused summarization: for callers, the profile captures data origin (user input, network, file), transformations applied (sanitized vs. raw), and how return values are used; for callees, it captures security risk level with justification; for globals, it captures the role and security implications. Each profiled element receives a relevance score, and only the highest-scoring elements that fit the analysis budget are retained.
Structured Reasoning. The selected context, target function, and vulnerability metadata (CWE classification, CVE description, commit context) are combined into a structured reasoning trace with five fields: observation (what the code does), security_reasoning (why it is or isn't vulnerable given context), impact (consequences), is_vulnerable (boolean verdict), and confidence_score (0-10). This structure prevents the model from pattern-matching on surface features and forces it to articulate the causal chain from context through to exploit.
Identify the target function. Determine which function the user wants analyzed. Read it fully and note its parameters, return type, memory operations, and any security-sensitive API calls (allocation, I/O, crypto, string manipulation, pointer arithmetic).
Extract callers. Search the codebase for every call site of the target function. For each caller, read the relevant code surrounding the call. Focus on: what data is passed as arguments, whether inputs are validated or sanitized before the call, and how the return value is checked.
Extract callees. Identify every function the target invokes. Read each callee's implementation. Focus on: whether the callee performs bounds checking, whether it can fail or return null, whether it has security-relevant side effects (memory allocation, lock acquisition, privilege changes).
Extract global state. Find global variables, shared structs, and static state that the target reads or writes. Determine whether other functions modify this state concurrently and whether the target assumes invariants about it.
Profile each context element for security relevance. For each caller, callee, and global variable, write a concise security profile:
Score and select context. Assign each profiled element a relevance score (0-10) based on how directly it affects whether the target function is exploitable. Retain only elements scoring 7+ or the top 3-5 elements if many qualify. Discard boilerplate, logging-only callers, and pure-utility callees with no security surface.
Assemble the analysis input. Combine: (a) the target function source, (b) the selected context profiles with code excerpts, and (c) any available vulnerability metadata (CWE category if suspected, known vulnerability patterns for the code pattern).
Generate a structured reasoning trace. Produce a verdict with these five fields:
observation: Factual description of what the target function does and what context revealssecurity_reasoning: The causal argument — how does the context make this vulnerable or safe? What preconditions are met or violated?impact: What could an attacker achieve if the vulnerability is real (denial of service, code execution, information leak)is_vulnerable: true/falseconfidence_score: 0-10 with justification for uncertaintyValidate with counterfactual reasoning. If the verdict is "not vulnerable," explicitly reason about what would make it vulnerable (e.g., "if caller X stopped validating the length parameter, this memcpy would overflow"). If the verdict is "vulnerable," reason about what defense would fix it and whether any caller already provides that defense.
Report findings with full traceability. Present the verdict with specific file:line references for the target, the relevant callers/callees, and the exact data flow path that constitutes the vulnerability (or the defense that prevents it).
Example 1: Null-pointer dereference across function boundary (CWE-476)
User: "Is put_busid_priv() in drivers/usb/usbip safe?"
Approach:
put_busid_priv() — it calls spin_unlock(&busid_table_lock) and accesses bid->statusstub_probe() and stub_disconnect() both call itstub_disconnect() can pass a NULL bid when the device is already unboundstub_disconnect() caller scores 9/10 (directly controls NULL path)spin_unlock has no NULL guard — scores 8/10Output:
{
"observation": "put_busid_priv() accesses bid->status and releases bid->busid_lock without null-checking bid. Caller stub_disconnect() can invoke this with bid=NULL when device is already unbound.",
"security_reasoning": "stub_disconnect() calls busid_priv_get() which returns NULL when the bus ID is not found. This NULL pointer is passed directly to put_busid_priv(). The function dereferences bid without a guard, causing a kernel NULL pointer dereference.",
"impact": "Kernel panic / denial of service. Local attacker with USB access can trigger by rapidly unbinding a USBIP device.",
"is_vulnerable": true,
"confidence_score": 9
}
Fix: Add if (!bid) return; at the top of put_busid_priv().
Example 2: Incorrect return value with caller-dependent impact (CWE-682)
User: "Review Item_direct_view_ref::const_item() in MariaDB for correctness."
Approach:
const_item() — it unconditionally returns trueconst_item() to decide whether an expression can be evaluated earlyconst_item() return value to push conditions into derived tables. Score: 10/10 — directly determines optimization correctnessconst_item() should check used_tables() == 0 but doesn't call it(*ref)->const_item() or checks used_tables()Output:
{
"observation": "const_item() returns true unconditionally instead of checking used_tables() == 0. This tells the optimizer that a non-constant expression is constant.",
"security_reasoning": "Callers in the query optimizer use this to push conditions into subqueries/derived tables. When a non-constant expression is treated as constant, the optimizer can evaluate it with wrong row context, producing incorrect query results. This is exploitable for data exfiltration if query results are user-visible.",
"impact": "Incorrect query results. Potential information disclosure if exploited in multi-tenant query paths.",
"is_vulnerable": true,
"confidence_score": 8
}
Example 3: Safe function confirmed via caller analysis
User: "Is this parse_header() function vulnerable to buffer overflow?"
Approach:
parse_header() — uses memcpy(buf, src, len) where len comes from a parameterhandle_request() and process_batch()len <= MAX_HEADER_SIZE before calling. handle_request() gets len from network input but clamps it. process_batch() computes len from a trusted internal struct.len, the memcpy overflowsOutput:
{
"observation": "parse_header() copies len bytes into buf via memcpy without internal bounds checking. Both current callers validate len <= MAX_HEADER_SIZE before invoking.",
"security_reasoning": "The function itself lacks bounds checking, but all current callers enforce the precondition. The function is safe in its current call context but is fragile — any new caller that omits the length check introduces a buffer overflow.",
"impact": "No current impact. Latent risk if calling convention changes.",
"is_vulnerable": false,
"confidence_score": 7
}
Recommendation: Add a assert(len <= MAX_HEADER_SIZE) or explicit check inside parse_header() for defense in depth.
malloc wrappers), sample callers from security-sensitive code paths (network handlers, parsers, authentication) rather than analyzing all call sites.Paper: Beyond Function-Level Analysis: Context-Aware Reasoning for Inter-Procedural Vulnerability Detection — Li et al., 2026. Look for: Table 2 (reasoning trace templates), Section 3 (context profiling pipeline), and the ablation in Section 5 showing that processed context + structured reasoning is the only combination that improves over baselines.
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".