skills/scoring/SKILL.md
Quantitative scoring framework for measuring audit quality with objective metrics to evaluate performance, track improvement over time, and identify areas needing attention. Use when benchmarking audit thoroughness, comparing engagement quality, or building quality gates into CI pipelines.
npx skillsauth add 0x-shashi/web3-audit-skills skills/scoringInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This directory provides the quantitative scoring framework for measuring audit quality. Use these metrics to objectively evaluate performance, track improvement over time, and identify areas needing attention.
| File | Description | |------|-------------| | AUDIT_SCORING.md | Complete scoring system — detection, precision, severity accuracy, coverage, efficiency metrics, composite score formula, reward schema, tracking templates, and industry benchmarks |
| Metric | Weight | What It Measures | |--------|--------|-----------------| | Detection Score | 35% | Vulnerabilities correctly identified vs. total real vulnerabilities | | Precision Score | 25% | Valid findings vs. total findings submitted (false positive rate) | | Severity Accuracy | 15% | Correct severity classification vs. total findings | | Coverage Score | 15% | Functions/entry points audited vs. total codebase | | Efficiency Score | 10% | Weighted findings produced per hour spent |
Composite Score =
(0.35 × Detection) + (0.25 × Precision) + (0.15 × Severity) + (0.15 × Coverage) + (0.10 × Efficiency)
These weights connect the scoring system to the severity classification:
| Severity | Points | Reference | |----------|--------|-----------| | Critical | 10 | Escalation required (not in standard severity files) | | High | 5 | high-severity.md | | Medium | 2 | medium-severity.md | | Low | 1 | low-severity.md | | Informational | 0.5 | Best-practice suggestions | | Gas | 0 | gas-optimizations.md |
Scoring requires completed audit findings with severity classifications. The Severity Classification skill MUST be applied before scoring.
To verify scoring accuracy, compare computed composite scores against known benchmarks:
# Example composite score calculation
detection = 0.85 # 85% of real vulns found
precision = 0.80 # 80% valid findings
severity_acc = 0.90 # 90% correct severity
coverage = 0.75 # 75% codebase covered
efficiency = 0.70 # Weighted findings per hour
composite = (0.35 * detection + 0.25 * precision + 0.15 * severity_acc + 0.15 * coverage + 0.10 * efficiency)
print(f"Composite Score: {composite:.2f}") # Expected: 0.81
# Score thresholds for audit quality tiers
tiers:
elite: 0.90+ # Top-tier competitive auditor
proficient: 0.75+ # Solid professional auditor
developing: 0.60+ # Learning auditor
needs_work: <0.60 # Consider additional training
# Validate scoring data integrity
python scripts/quality-check.py skills/scoring/SKILL.md
development
Systematically hunt for every variant of a discovered vulnerability across the entire codebase. Use when a bug is found and all instances of the same root cause pattern must be identified, or when performing variant analysis during competitive audits on Code4rena or Sherlock.
testing
Use when the user wants to audit TON smart contracts for security vulnerabilities, scan FunC or Tact contracts for message chain replay, bounce handling, or gas issues, review TON DeFi protocols for actor-model concurrency flaws, or analyze asynchronous message passing security.
tools
Analyze ERC20/ERC721/ERC1155 token implementations for non-standard behavior, fee-on-transfer mechanics, rebasing logic, blacklists, pausability, and integration risks. Use when reviewing protocols that interact with external tokens or implementing token-related features.
testing
Use when the user wants to audit Sui Move smart contracts, scan Sui-specific patterns including object ownership, shared objects, or dynamic fields, review Sui DeFi protocols for object model security issues, or analyze Sui-specific transaction and consensus patterns.