skills/agenticscr-an-autonomous-agentic/SKILL.md
Agentic secure code review for detecting immature vulnerabilities at pre-commit stage. Uses a two-phase Detector-Validator pipeline with SAST-rule semantic memory and CWE-tree validation to localize, classify, and explain security weaknesses in code diffs. Trigger phrases: "review this diff for security issues", "secure code review", "find vulnerabilities in my changes", "pre-commit security check", "check this PR for security weaknesses", "agentic security review"
npx skillsauth add ndpvt-web/arxiv-claude-skills agenticscr-an-autonomous-agenticInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill enables Claude to perform structured, agentic secure code review on code diffs and pull requests using the AgenticSCR methodology. Instead of scanning entire codebases or relying on single-pass LLM analysis, this technique uses a two-subagent pipeline — a Detector that applies SAST-rule patterns to localize vulnerabilities, followed by a Validator that filters false positives using CWE taxonomy knowledge. The approach targets immature vulnerabilities: incomplete, latent, or context-dependent weaknesses introduced through small incremental code changes that appear benign in isolation but evolve into exploitable flaws as surrounding code is added.
AgenticSCR's core insight is that pre-commit vulnerabilities are immature — they don't match the fully-formed patterns that SAST tools are designed to catch. An insecure API call or a missing validation check may look harmless in a small diff, but becomes exploitable as surrounding code evolves. Detecting these requires combining pattern-based detection (SAST rules) with contextual reasoning (understanding what the code does in its repository context) and taxonomic validation (confirming the weakness maps to a real CWE category).
The system operates as a two-subagent pipeline. The Detector subagent loads SAST rule definitions (CodeQL-style patterns including rule ID, description, CWE mapping, severity, and vulnerable/secure code examples) into its working context, then navigates the repository using tools — reading diffs, expanding code hunks for surrounding context, grepping for security-relevant patterns, and inspecting directory structure. It produces review comments with file path, line number, vulnerability description, and predicted CWE. The Validator subagent then loads the CWE-1000 taxonomy tree and validates each comment: does the identified weakness match a real CWE category? Are the preconditions for that weakness present? Is the finding exploitable given the code context? Comments that fail validation are filtered as false positives.
This two-phase approach achieves 153% more correct review comments than single-pass LLM analysis and 71-85% fewer false positives than SAST tools, because the Detector casts a wide net using established patterns while the Validator prunes aggressively using domain knowledge.
Extract the diff scope. Obtain the code diff (from git diff, a PR, or pasted text). Identify the modified files, their languages, and the specific line ranges changed. Focus only on modified code — do not ingest the entire repository.
Expand context around changed hunks. For each modified file, read 20-50 lines of surrounding context above and below the changed lines. This reveals function signatures, imports, class definitions, and control flow that the diff alone hides. Use file reading and grep to trace how changed variables are used elsewhere.
Apply SAST-rule detection patterns. Systematically check the diff against these five vulnerability categories:
Localize each finding to a specific line. For every potential vulnerability, identify the exact file path and line number where the weakness is introduced. The finding must point to a line within or very close to (within +-5 lines of) the actual changed code.
Draft a review comment for each finding. Each comment must include:
file: the file pathline: the specific line numbercomment: a plain-language explanation of the vulnerability and why it's dangerouscwe: the CWE ID (e.g., CWE-79) and nameseverity: critical, high, medium, or lowsuggestion: concrete remediation code or guidanceValidate each comment against CWE taxonomy. For every drafted comment, ask:
Check for cross-file dependencies. If a changed function is called from other files, grep for its callers. Verify whether input sanitization happens at the call site rather than the definition site. A missing validation in the diff may be intentional if validation occurs upstream.
Classify the overall risk. Summarize how many validated findings exist per severity level. Flag any critical or high findings that need immediate attention before commit.
Present findings as structured review comments. Output a clean table or list of validated findings sorted by severity, each with file, line, CWE, description, and remediation suggestion.
Acknowledge uncertainty. For findings where context is incomplete (e.g., you cannot see the full call chain), mark the finding as "potential" rather than "confirmed" and explain what additional context would be needed to verify.
Example 1: SQL Injection in a Python Web Endpoint
User: "Review this diff for security issues"
# app/routes/users.py (changed lines)
+ @app.route('/users/search')
+ def search_users():
+ query = request.args.get('q')
+ results = db.execute(f"SELECT * FROM users WHERE name LIKE '%{query}%'")
+ return jsonify([dict(r) for r in results])
Approach:
app/routes/users.py, lines added to a Flask routerequest.args.get('q') flows unsanitized into an f-string SQL query — classic SQL injection (CWE-89, child of CWE-707 Injection)db.execute(f"SELECT...)Output:
| File | Line | CWE | Severity | Finding |
|------|------|-----|----------|---------|
| app/routes/users.py | 4 | CWE-89 (SQL Injection) | Critical | User input from request.args.get('q') is interpolated directly into SQL via f-string. Use parameterized queries: db.execute("SELECT * FROM users WHERE name LIKE ?", (f"%{query}%",)) |
Example 2: Missing Authorization Check in a JavaScript API
User: "Check this PR for security weaknesses"
// src/api/admin.js (changed lines)
+ router.delete('/admin/users/:id', async (req, res) => {
+ const userId = req.params.id;
+ await User.findByIdAndDelete(userId);
+ res.json({ success: true });
+ });
Approach:
router has authentication/authorization middleware applied globallyrouter.use(auth or requireAdmin in same file and route configuration:id parameter (potential CWE-20), but framework likely handles ObjectId validationOutput:
| File | Line | CWE | Severity | Finding |
|------|------|-----|----------|---------|
| src/api/admin.js | 1 | CWE-284 (Improper Access Control) | Critical | Admin user deletion endpoint lacks authorization middleware. Any request to DELETE /admin/users/:id will succeed. Add requireAdmin middleware: router.delete('/admin/users/:id', requireAdmin, async (req, res) => { |
| src/api/admin.js | 2 | CWE-20 (Improper Input Validation) | Low (potential) | req.params.id is not validated before database query. Verify that the ORM or framework rejects malformed IDs — if using MongoDB with Mongoose, ObjectId casting provides implicit validation. |
Example 3: Information Disclosure via Error Logging
User: "Find vulnerabilities in my changes"
# services/payment.py (changed lines)
+ def process_payment(card_number, amount):
+ try:
+ result = gateway.charge(card_number, amount)
+ except GatewayError as e:
+ logger.error(f"Payment failed for card {card_number}: {e}")
+ raise PaymentError("Payment processing failed")
+ return result
Approach:
card_number (PCI-sensitive data) is logged in plaintext in the error handler (CWE-532, child of CWE-200 Information Exposure)logger.error(f"Payment failed for card {card_number}...")raise PaymentError is safe (generic message), but the logger line is the issueOutput:
| File | Line | CWE | Severity | Finding |
|------|------|-----|----------|---------|
| services/payment.py | 5 | CWE-532 (Information Exposure Through Log Files) | High | Full card number logged in error handler. This violates PCI DSS requirements. Mask the card number: logger.error(f"Payment failed for card ***{card_number[-4:]}: {e}") |
npm audit, pip-audit).Paper: AgenticSCR: An Autonomous Agentic Secure Code Review for Immature Vulnerabilities Detection — Charoenwet et al., 2026. Look for: the two-subagent Detector-Validator architecture, the SAST-rule and CWE-tree semantic memory design, the SCRBench benchmark with 144 pre-commit changes across 107 CVEs, and the five vulnerability type definitions (Injection, Authorization, Information, Resource, Control).
development
Audit LLM-based automatic short answer grading (ASAG) systems for adversarial vulnerabilities using token-level and prompt-level attack strategies from the GradingAttack framework. Triggers: 'test grading robustness', 'adversarial attack on grading', 'audit LLM grader', 'red-team answer grading', 'ASAG vulnerability assessment', 'grading fairness attack'
development
Build structured information-seeking agents that decompose complex queries into multi-turn search-and-browse workflows, aggregate results from multiple web sources, and return answers in typed structured formats (items, sets, lists, tables). Applies the GISA benchmark's ReAct-based agent architecture and evaluation methodology. Trigger phrases: "build an information-seeking agent", "search agent pipeline", "multi-turn web research agent", "structured web search workflow", "aggregate information from multiple sources", "web research with structured output"
data-ai
Optimize LLM prompts using GFlowPO's iterative generate-evaluate-refine loop with diversity-preserving exploration and dynamic memory. Use when: 'optimize this prompt', 'find a better prompt for this task', 'prompt engineering with examples', 'auto-tune my system prompt', 'improve prompt accuracy', 'generate prompt variations'.
development
Constrain LLM generation with executable Pydantic schemas and multi-agent pipelines to produce structurally valid, domain-rich artifacts. Uses ontology-as-grammar to eliminate hallucinated structures while preserving creative output. Trigger phrases: "generate a valid game design", "schema-constrained generation", "build a multi-agent pipeline with Pydantic validation", "ontology-driven content generation", "structured creative generation with DSPy", "generate artifacts that pass domain validation".