skills/an-cost-efficient-agentic-framework/SKILL.md
Audit Ethereum smart contracts for business logic vulnerabilities using Heimdallr's four-phase agentic pipeline: function-level code reorganization via dependency graph clustering, heuristic Plan-Remind-Solve reasoning with adversarial state injection, automatic multi-step exploit chaining, and cascaded false-positive filtration. Trigger phrases: 'audit this smart contract', 'find vulnerabilities in this Solidity code', 'check this DeFi protocol for exploits', 'smart contract security review', 'detect business logic bugs in this contract', 'chain exploit paths in this protocol'.
npx skillsauth add ndpvt-web/arxiv-claude-skills an-cost-efficient-agentic-frameworkInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to perform structured, multi-phase security audits of Ethereum smart contracts following the Heimdallr framework (Hu et al., 2026). Rather than scanning line-by-line or relying on generic static analysis heuristics, the approach reorganizes contract code into semantically cohesive batches using dependency graph clustering, then applies a Plan-Remind-Solve reasoning loop with adversarial state assumptions, chains individual findings into composite multi-step exploits, and finally filters results through a three-layer cascaded verification to eliminate false positives. This method specifically targets business logic vulnerabilities — flaws in protocol-specific economic mechanisms, state transitions, access control workflows, and cross-contract interactions — that traditional tools consistently miss.
Function-Level Code Reorganization (Contextual Profiling). Instead of feeding entire contracts into an LLM (which wastes context and loses coherence), Heimdallr builds a static directed dependency graph where nodes are functions and edges represent control flow (direct calls) and data flow (shared state variable access). It then applies the Louvain community detection algorithm to partition this graph into densely connected subgraphs — each representing a logically cohesive unit of business logic. Functions are scored by importance using a weighted combination of betweenness centrality and PageRank: Score(f) = alpha * Betweenness(f) + beta * PageRank(f). High-scoring functions are audited first. This reorganization means the LLM sees related functions together regardless of which .sol file they live in, dramatically reducing context noise.
Plan-Remind-Solve with Adversarial State Injection. For each batch, the auditor runs a three-step loop: (1) Plan — identify which functions look potentially vulnerable and why; (2) Remind — retrieve concrete exploit patterns from a knowledge base organized by vulnerability category (reentrancy, oracle manipulation, precision loss, etc.); (3) Solve — analyze each candidate through three complementary lenses: basic semantic analysis, adversarial state assumptions (block variable manipulation, malicious calldata, flash-loan unlimited capital), and symbolic constraint solving (modeling arithmetic in Z3 to prove whether invariant violations are reachable at boundary values). Individual findings are then chained: if the postcondition of vulnerability A satisfies the precondition of vulnerability B, they merge into a composite multi-step exploit.
Cascaded Verification. Raw findings pass through three sequential filters: (1) Contextual Aggregation — re-evaluate each finding under global contract scope to check if safeguards elsewhere neutralize the risk; (2) Semantic Deduplication — cluster similar findings by embedding similarity, keep only the highest-confidence representative; (3) Threat Model Assessment — discard findings that require unrealistic assumptions like dishonest contract owners or compromised external oracles (unless the user explicitly includes those in scope).
Collect all contract source code. Read every .sol file the user provides. If the user gives a single file, also ask whether there are imported dependencies or interfaces that affect the logic (e.g., OpenZeppelin, protocol-specific libraries).
Build the dependency graph. For each contract, enumerate all functions (public, external, internal, private). Map edges for: direct function calls (functionA calls functionB), shared state variable reads/writes, modifier applications, and inheritance overrides. Represent this as an adjacency list or mental model.
Cluster into semantic batches. Group functions into cohesive units based on their connectivity. Functions that share state variables or call each other belong together. Prioritize batches containing functions with high centrality (called by many others or sitting on critical paths between entry points and state changes). A typical DeFi protocol yields 3-8 batches: e.g., "deposit/withdraw/accounting", "liquidation/health-check", "governance/parameter-setting", "oracle-integration".
Score and prioritize. Within each batch, rank functions by importance. External/public functions callable by anyone rank highest. Functions that modify balances, ownership, or protocol parameters rank above pure view functions. Start the audit with the highest-priority batch.
Plan — Identify suspicious patterns. For the current batch, read each function and flag candidates that exhibit: unchecked external calls, arithmetic on user-supplied values without overflow/underflow guards, state reads before writes (potential reentrancy), missing access control, reliance on block.timestamp or block.number, token transfer patterns without return-value checks, or price calculations using spot reserves.
Remind — Match against known vulnerability patterns. For each flagged candidate, explicitly recall the relevant exploit pattern:
mulDiv or mulDown operations accumulating rounding errorsonlyOwner/role checks on sensitive settersSolve — Analyze with adversarial assumptions. For each candidate, test under three adversarial profiles:
block.timestamp is at the edge of a range? What if this is called in the same block as another transaction?transferFrom returns false silently?Chain exploits. Review all individual findings across batches. For each pair (A, B), check: does successfully exploiting A create a state that makes B exploitable? Common chains: oracle manipulation -> undercollateralized borrow -> bad debt; precision loss -> repeated micro-transactions -> invariant drift -> profitable withdrawal; access control bypass -> parameter manipulation -> economic drain.
Verify through cascaded filtration. For each finding:
nonReentrant modifier, a pause mechanism, an admin-only rescue function) that mitigates this?Report findings. For each verified vulnerability, output: severity (Critical/High/Medium/Low/Informational), title, affected function(s) with line references, root cause explanation, step-by-step exploit scenario, estimated impact (funds at risk), and a recommended fix with code.
Example 1: Precision Loss in a DEX Pool
User: "Audit this Balancer-style weighted pool contract for vulnerabilities."
Approach:
swap() calls _upscaleArray(), _downscaleArray(), calcOutGivenIn() which use mulDown() and divDown() from FixedPoint library.calcOutGivenIn performs multiple sequential mulDown operations — flag for precision loss.Output:
## [CRITICAL] Precision Loss Cascade in Weighted Pool Swap
**Affected:** `WeightedPool.swap()` -> `WeightedMath.calcOutGivenIn()` (lines 142-189)
**Root Cause:** Sequential `mulDown()` and `divDown()` operations in the swap
calculation accumulate rounding errors of up to 8-9 wei per operation. No minimum
swap amount is enforced.
**Exploit Scenario:**
1. Attacker flash-loans large token amount
2. Executes 65+ micro-swaps in a single transaction
3. Each swap extracts ~8 wei more than it should due to rounding
4. Accumulated extraction drains pool value over repeated transactions
**Impact:** Pool invariant diverges from actual balances; estimated extractable
value scales with pool TVL and number of micro-swaps per block.
**Recommendation:** Enforce a minimum swap amount that exceeds the maximum
per-operation rounding error, or use `mulUp`/`divUp` for output calculations
to ensure rounding always favors the pool.
Example 2: Cross-Contract Reentrancy in a Lending Protocol
User: "Check this lending protocol for reentrancy — it uses nonReentrant on all external functions."
Approach:
deposit(), borrow(), withdraw(), liquidate() are external; each calls internal helpers _updateAccountState(), _checkHealthFactor().balances state; borrow/liquidate share debts state; both groups share healthFactor computation.nonReentrant — but deposit() writes to balances mapping, then calls an ERC-777 token's transfer which triggers a tokensReceived hook. The hook can call borrow() — a different function, so the single-function reentrancy guard does not block it.nonReentrant per-function does not prevent reentrancy across different functions sharing state.tokensReceived callback fires -> attacker calls borrow() before deposit() finishes updating internal accounting -> health factor computed on stale state -> attacker borrows more than collateral allows.Output:
## [HIGH] Cross-Function Reentrancy via ERC-777 Token Hook
**Affected:** `LendingPool.deposit()` (line 87) -> `LendingPool.borrow()` (line 134)
**Root Cause:** `nonReentrant` modifier uses per-function locking. When `deposit()`
transfers an ERC-777 token, the `tokensReceived` hook executes before deposit
accounting completes. The hook can invoke `borrow()`, which reads stale balances.
**Exploit Scenario:**
1. Attacker calls `deposit()` with an ERC-777 collateral token
2. During token transfer, `tokensReceived` hook triggers
3. Hook calls `borrow()` — permitted because it has a separate reentrancy lock
4. `borrow()` reads `balances[attacker]` which is already credited but internal
accounting (health factor) has not yet been updated
5. Attacker borrows against inflated health factor, then withdraws
**Impact:** Attacker can borrow up to 100% of deposited collateral value instead
of the protocol's intended LTV ratio.
**Recommendation:** Replace per-function `nonReentrant` with a contract-level
reentrancy guard (single shared boolean). Alternatively, follow checks-effects-
interactions: update all internal state before any external call.
Example 3: Quick Triage of a Single Function
User: "Is this withdrawAll function safe?"
function withdrawAll() external {
uint256 balance = balances[msg.sender];
(bool success, ) = msg.sender.call{value: balance}("");
require(success, "Transfer failed");
balances[msg.sender] = 0;
}
Approach:
balances[msg.sender] = 0) happens after external call (msg.sender.call). Classic pattern.receive() fallback that calls withdrawAll() again. On re-entry, balances[msg.sender] still holds the original value. Drains contract.nonReentrant modifier. No checks-effects-interactions. Confirmed exploitable.Output:
## [CRITICAL] Reentrancy in withdrawAll()
State update on line 5 (`balances[msg.sender] = 0`) occurs after the external
call on line 3. An attacker contract's `receive()` function can re-enter
`withdrawAll()` before the balance is zeroed, draining the entire contract.
Fix — move the state update before the external call:
uint256 balance = balances[msg.sender];
balances[msg.sender] = 0; // <-- move before call
(bool success, ) = msg.sender.call{value: balance}("");
require(success, "Transfer failed");
Hu, X., Chan, W.Y., Shi, Y., Sun, Q., & Wang, W.-C. (2026). An Effective and Cost-Efficient Agentic Framework for Ethereum Smart Contract Auditing. arXiv:2601.17833v1. https://arxiv.org/abs/2601.17833v1
Key sections to study: Section 3 (Contextual Profiling via Louvain clustering), Section 4 (Plan-Remind-Solve auditor with adversarial state injection profiles), Section 5 (Cascaded verification filters), and the Balancer V2 case study demonstrating precision loss cascade detection.
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".