agents/skills/evm/verification-protocol/SKILL.md
How to prove a hypothesis is TRUE or FALSE using Foundry tests.
npx skillsauth add plamentsv/plamen verification-protocolInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
How to prove a hypothesis is TRUE or FALSE using Foundry tests.
CRITICAL: For EVERY piece of evidence used in verification, you MUST tag its source. Evidence from mocks or unverified external contracts CANNOT support a REFUTED verdict.
| Tag | Meaning | Valid for REFUTED? | |-----|---------|-------------------| | [PROD] | Production contract (verified on-chain) | YES | | [MOCK] | Mock/test contract | NO | | [CODE] | Audited codebase (in-scope) | YES | | [EXT-UNV] | External, unverified behavior | NO | | [DOC] | Documentation/spec only | NO (needs verification) |
Before ANY verdict, fill this table:
### Evidence Audit
| Claim | Evidence Source | Tag | Valid for REFUTED? |
|-------|-----------------|-----|-------------------|
| "External returns X" | Mock contract | [MOCK] | NO |
| "State changes to Y" | Protocol.sol:123 | [CODE] | YES |
| "Transfer triggers Z" | Etherscan source | [PROD] | YES |
AUTOMATIC OVERRIDE: If ANY evidence supporting REFUTED has tag [MOCK] or [EXT-UNV]:
Example:
## Verdict: REFUTED -> CONTESTED (mock evidence override)
### Evidence Audit
| Claim | Source | Tag | Valid? |
|-------|--------|-----|--------|
| "Staking returns shares" | StakingMock.sol:45 | [MOCK] | NO |
**Override reason**: REFUTED verdict relies on mock behavior at StakingMock.sol:45.
Production contract behavior is UNVERIFIED. Must fetch production source.
Before writing ANY test code, you MUST answer:
NOT: "Something is inconsistent"
NOT: "State is wrong"
NOT: "Reentrancy possible"
YES: "[Variable] is [read/written] at [location] but should be [read/written]
at [other location] because [specific reason]"
NOT: "Values are different"
NOT: "State changed"
YES: "Before operation: [variable] = [expected value]
After operation: [variable] = [actual value]
Expected: [what it should be]"
NOT: assertTrue(bugExists)
NOT: assertFalse(isSecure)
YES: assertEq(actualValue, expectedValue, "description of what's wrong")
OR: assertNotEq(before, after, "value changed when it shouldn't")
OR: assertGt(error, threshold, "error exceeds acceptable threshold")
If you cannot answer all three -> ASK FOR CLARIFICATION
Before writing test code, verify these two gates. If either FAILS, adjust the hypothesis.
Trace a call path from a permissionless entry point to the vulnerable code.
If NO entry point reaches the vulnerable code → UNREACHABLE → FALSE_POSITIVE. If reachable only through a restricted path → document the restriction, adjust likelihood.
Substitute real-world value domains into the expression that triggers the bug.
If the bug requires values outside feasible domains → INFEASIBLE → FALSE_POSITIVE. If feasible only at extreme but realistic parameters → document the threshold, proceed with adjusted severity.
Both gates PASS → proceed to PoC. Either gate FAILS → document and stop.
// SPDX-License-Identifier: MIT
pragma solidity ^0.8.0;
import "forge-std/Test.sol";
import "forge-std/console.sol";
/**
* @title Test_H{N}: {Title}
*
* BUG: {2 sentence description}
* EXPECTED: {what should happen}
* ACTUAL: {what does happen}
*/
contract Test_H{N} is Test {
// === CONTRACTS ===
// Declare target contract and any dependencies
// === ACTORS ===
address attacker = makeAddr("attacker");
address victim = makeAddr("victim");
address owner = makeAddr("owner");
// === SETUP ===
function setUp() public {
// Deploy contracts
// Set initial state
// Fund actors if needed
}
// === TEST: Direct bug demonstration ===
function test_H{N}_bug_demonstration() public {
// 1. RECORD BEFORE
console.log("=== BEFORE ===");
uint256 valueBefore = target.criticalValue();
console.log("Critical value:", valueBefore);
// 2. ACTION
console.log("=== ACTION ===");
// Perform the operation that triggers the bug
// 3. RECORD AFTER
console.log("=== AFTER ===");
uint256 valueAfter = target.criticalValue();
console.log("Critical value:", valueAfter);
// 4. PROVE BUG
console.log("=== VERIFICATION ===");
// THE ASSERTION THAT PROVES THE BUG
// Design this so it PASSES when the bug EXISTS
}
// === TEST: Impact demonstration (optional) ===
function test_H{N}_impact() public {
// Show cumulative impact or attacker profit
}
}
The assertion that "proves the bug" succeeded.
assertNotEq(after, before) passes -> values ARE different (bug exists)assertGt(error, threshold) passes -> error IS above threshold (bug exists)| Failure | Meaning | Action | |---------|---------|--------| | Assertion failed: values equal | Bug doesn't exist as hypothesized | Re-examine hypothesis | | Revert in setup | Deployment/config wrong | Fix setup | | Revert in action | Operation blocked | Check preconditions | | Arithmetic error | Values wrong | Check calculations |
Attempt 1: Direct implementation of test strategy from hypothesis
Attempt 2: Adjust parameters
Attempt 3: Re-examine assumptions
After 5 attempts:
## Verdict: CONFIRMED
### Bug Mechanism Verified
{Explain what the test proves in 2-3 sentences}
### Test File
`test/audit/Test_H{N}.t.sol`
### Test Output
{Paste relevant forge test output}
### Key Evidence
| Metric | Value |
|--------|-------|
| Before | {value} |
| After | {value} |
| Expected | {value} |
| Difference | {calculation} |
### Severity: {LEVEL}
{Justification in 1-2 sentences}
## Verdict: FALSE_POSITIVE
### Attempts Made
**Attempt 1:**
- Approach: {description}
- Result: {what happened}
- Learning: {insight}
**Attempt 2:**
- Approach: {description}
- Result: {what happened}
- Learning: {insight}
**Attempt 3:**
- Approach: {description}
- Result: {what happened}
- Learning: {insight}
### Why It's Not a Bug
{Explain the actual behavior and why hypothesis was wrong in 2-3 sentences}
## Verdict: CONTESTED
### Evidence Status
| Checkpoint | Status | Details |
|------------|--------|---------|
| External behavior verified against PRODUCTION | NO | Used mock behavior as evidence |
| All callers checked | YES | Checked A, B, C |
| Profit calculated with attacker holding | NO | Only analyzed donation loss |
### Why This Cannot Be REFUTED
{Explain what evidence is missing to definitively rule out the bug}
### Escalation Required
- [ ] Fetch production contract source for {external dep}
- [ ] Re-analyze with attacker holding shares
- [ ] Check additional caller paths: {list}
### Current Assessment
Likely: {TRUE_POSITIVE / FALSE_POSITIVE / UNKNOWN}
Confidence: {LOW / MEDIUM}
MANDATORY: You MUST check ALL boxes before returning REFUTED. If ANY checkbox is NO -> Return CONTESTED, not REFUTED.
Before marking REFUTED, check:
{scratchpad}/external_production_behavior.mdmcp__slither-analyzer__get_function_callers() to enumerate{scratchpad}/findings_inventory.md for CONFIRMED/PARTIAL findingsFor findings from Validation Sweep ([VS-]) or Blind Spot Scanner ([BLIND-]): apply Rule 13's 5-question test BEFORE any downgrade. HALT: If test shows users harmed AND unavoidable AND undocumented -> you CANNOT return FALSE_POSITIVE. Minimum verdict: CONTESTED. Defense parity gaps (Contract A has protection X, Contract B lacks it for same action) are NEVER "by design" -> minimum severity: Medium, minimum verdict: CONTESTED. Violating this halt is a workflow error equivalent to using [MOCK] evidence for REFUTED.
A finding is NEVER truly REFUTED until chain analysis completes.
If you mark a finding as REFUTED but document a missing precondition, the chain analyzer (Step 6b) will search for other findings whose postconditions match your missing precondition. If found, the finding will be escalated to CONTESTED and combined into a chain hypothesis.
Example:
Advanced Protocol Reference: See
advanced.mdfor RAG queries before PoC, exchange rate finding severity, design flaw escalation, bidirectional role analysis, RAG confidence override, chain hypothesis protection, fork testing, and Foundry PoC methodology.
development
Prepare Solidity projects for a security audit — test coverage, test quality, NatSpec docs, code hygiene, dependency health, best-practice enforcement, deployment readiness, and project documentation checks. Generates a scored Audit Readiness Report and optionally runs static analysis. Trigger on: "prepare for audit", "audit readiness", "pre-audit check", "audit prep", "NatSpec check", or any request to review a Solidity codebase before a security review.
development
Launch the Plamen deterministic Web3 security audit pipeline
development
Run the Plamen smart-contract audit wizard in Codex
testing
Launch the Plamen deterministic L1 infrastructure audit pipeline