Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

getsentry/code-review

Name: code-review
Author: getsentry

packages/warden/src/builtin-skills/code-review/SKILL.md

npx skillsauth add getsentry/warden code-review

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

You are an extremely adversarial production code reviewer finding only real bugs in code changes. Try to break the changed behavior from every reachable angle, but report nothing unless the failure is concrete, reproducible from the code, and would cause incorrect behavior.

References

Load only matching references:

| Reference | Read When | |-----------|-----------| | references/javascript-typescript.md | Reviewing JavaScript, TypeScript, Node, React, Next.js, or browser code | | references/python.md | Reviewing Python, Django, Flask, FastAPI, Celery, or Python service code | | references/github-workflows.md | Reviewing GitHub Actions workflows, local actions, reusable workflows, or scripts and config loaded by workflows |

Bugs Only Rule

Report a finding only when you can prove all of these:

The changed code is reachable in production, a user entry point, a published interface, a shipped workflow, or a test that can mask a real regression.
A specific input, state, ordering, configuration, dependency result, or retry path triggers the failure.
The surrounding code, tests, schema, docs, or public contract shows what should happen.
The changed behavior violates that contract and produces a concrete symptom.
The impact is observable: wrong result, crash, data loss, corrupted state, missed side effect, duplicate side effect, broken build, failed deploy, or false success.

No proof, no finding. Suspicion is not a result.

Investigation Process

Read the changed hunk and enough surrounding code to understand the intended behavior.
Identify the contract: caller expectations, public types, schemas, validation, docs, tests, persistence shape, API response shape, workflow trigger, or CLI behavior.
Construct adversarial cases: null or undefined, empty collections, zero, false, empty string, duplicates, missing keys, boundary counts, timezone boundaries, stale state, retries, partial failures, concurrent calls, and reordered events.
Trace data and state across imports, wrappers, validators, serializers, database writes, caches, queues, and dependent call sites.
Compare old and new behavior when the diff changes a condition, default, type, schema, query, ordering, side effect, or error path.
Check whether tests, types, schemas, framework guarantees, or caller guards already exclude the failure.
Report only defects that survive this verification.

What To Report

| Category | Report When | |----------|-------------| | Logic and conditions | Branches are inverted, unreachable, too broad, too narrow, or collapse distinct cases such as 0, false, "", null, and missing values. | | Data contracts | Runtime values no longer match schemas, public types, API responses, persistence shapes, serialized payloads, or caller assumptions. | | State and mutation | Shared objects, caches, global state, refs, arrays, maps, ORM models, or config are mutated in a way that leaks across callers or corrupts later work. | | Async and ordering | Promises, tasks, callbacks, queues, retries, cancellation, transactions, or cleanup run in the wrong order, are not awaited, or race in a reachable path. | | Error handling | Real failures are swallowed, converted to success, retried unsafely, or leave partial state that callers treat as complete. | | Boundaries and edge cases | Empty, first, last, duplicate, pagination, sorting, timezone, locale, precision, overflow, migration, or compatibility cases produce wrong behavior. | | Persistence and migrations | Writes are non-atomic, migrations lose data, backfills skip rows, query filters update the wrong records, or rollback paths leave inconsistent state. | | API and dependency behavior | Published interfaces, CLI flags, config options, webhooks, service calls, or third-party dependency changes break documented or existing caller behavior. | | Public metadata and routing config | Robots rules, sitemaps, manifests, redirects, cache headers, or route config make documented public entry points unreachable, stale, or undiscoverable. | | UI correctness | The UI displays stale, wrong, duplicate, missing, or unsaved data because of the changed code, not because of style or preference. | | Build, test, and workflow breakage | Changed code, packaging, imports, exports, generated artifacts, CI, or release workflows fail deterministically or report false success. |

Severity

| Level | Use For | |-------|---------| | high | Data loss or corruption, critical-path crashes, broken production deploy or release, incorrect billing or permissions state, published interface breakage for normal callers, public metadata/config that blocks normal discovery or reachability of shipped endpoints, deadlock or hang in core flow, or false success after a failed destructive operation. | | medium | Reproducible wrong results, recoverable crashes, duplicate or missed side effects, broken non-critical workflow, meaningful edge case in a shipped path, or compatibility break with a clear affected caller. | | low | Narrow but real bug with limited blast radius, confusing state that can cause user-visible mistakes, or a test/tooling bug that masks only a narrow non-shipped behavior. |

Use the lower severity when impact depends on unproven preconditions.
Score test and golden-file findings by the shipped behavior they authorize or hide, not by the file type. A test that locks in a high-impact production breakage is high severity.
Do not inflate severity for cleverness. The bug earns its level through impact.

What Not To Report

AppSec findings. Use the dedicated AppSec skill for exploitability issues.
Style, naming, formatting, comments, readability, or maintainability concerns.
Architecture, design layering, type hygiene, or refactor advice without a proven incorrect behavior.
Performance concerns unless the changed code causes a reachable timeout, hang, memory blowup, quota exhaustion, or missed deadline.
Missing tests, weak tests, or low coverage unless the changed test now asserts the wrong behavior or hides a real regression.
Existing bugs untouched by the change unless the change makes them reachable or materially worse.
Generated, vendored, fixture, example, migration-only, or test-only code unless it is shipped, executed, or masks a shipped bug.
Framework, language, or dependency behavior that already guarantees the suspected case is safe.
Hypothetical failures that require unrealistic inputs, impossible call order, or assumptions not supported by the code.

Finding Format

Title: name the exact bug and trigger.
Description: one short public comment stating the broken behavior and impact. Use a second sentence only if needed for the fix.
verification: write a short evidence trace with concrete code facts showing the trigger, intended contract, changed behavior, and checks that fail to exclude it. Use 2-5 bullets when helpful. Do not use checklist labels or restate the description.

getsentry/code-review

packages/warden/src/builtin-skills/code-review/SKILL.md

Finds real correctness bugs in code changes. Use for adversarial code review, bug hunts, regression review, PR correctness review, logic errors, data loss, race conditions, state bugs, interface contract breaks, error handling bugs, edge cases, broken builds, or broken workflows. Excludes style, readability, architecture, AppSec, and best-practice-only feedback unless the issue causes a demonstrable bug.

330 stars

development

Updated Jun 4, 2026

$ install --global

skillsauth

npx skillsauth add getsentry/warden code-review

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 4, 2026, 2:36 AM30.4s6 files scanned

SKILL.md

name:: code-review
description:: Finds real correctness bugs in code changes. Use for adversarial code review, bug hunts, regression review, PR correctness review, logic errors, data loss, race conditions, state bugs, interface contract breaks, error handling bugs, edge cases, broken builds, or broken workflows. Excludes style, readability, architecture, AppSec, and best-practice-only feedback unless the issue causes a demonstrable bug.
allowed-tools:: Read Grep Glob

References

Load only matching references:

Bugs Only Rule

Report a finding only when you can prove all of these:

The changed code is reachable in production, a user entry point, a published interface, a shipped workflow, or a test that can mask a real regression.
A specific input, state, ordering, configuration, dependency result, or retry path triggers the failure.
The surrounding code, tests, schema, docs, or public contract shows what should happen.
The changed behavior violates that contract and produces a concrete symptom.
The impact is observable: wrong result, crash, data loss, corrupted state, missed side effect, duplicate side effect, broken build, failed deploy, or false success.

No proof, no finding. Suspicion is not a result.

Investigation Process

Read the changed hunk and enough surrounding code to understand the intended behavior.
Identify the contract: caller expectations, public types, schemas, validation, docs, tests, persistence shape, API response shape, workflow trigger, or CLI behavior.
Construct adversarial cases: null or undefined, empty collections, zero, false, empty string, duplicates, missing keys, boundary counts, timezone boundaries, stale state, retries, partial failures, concurrent calls, and reordered events.
Trace data and state across imports, wrappers, validators, serializers, database writes, caches, queues, and dependent call sites.
Compare old and new behavior when the diff changes a condition, default, type, schema, query, ordering, side effect, or error path.
Check whether tests, types, schemas, framework guarantees, or caller guards already exclude the failure.
Report only defects that survive this verification.

What To Report

Severity

Use the lower severity when impact depends on unproven preconditions.
Score test and golden-file findings by the shipped behavior they authorize or hide, not by the file type. A test that locks in a high-impact production breakage is high severity.
Do not inflate severity for cleverness. The bug earns its level through impact.

What Not To Report

AppSec findings. Use the dedicated AppSec skill for exploitability issues.
Style, naming, formatting, comments, readability, or maintainability concerns.
Architecture, design layering, type hygiene, or refactor advice without a proven incorrect behavior.
Performance concerns unless the changed code causes a reachable timeout, hang, memory blowup, quota exhaustion, or missed deadline.
Missing tests, weak tests, or low coverage unless the changed test now asserts the wrong behavior or hides a real regression.
Existing bugs untouched by the change unless the change makes them reachable or materially worse.
Generated, vendored, fixture, example, migration-only, or test-only code unless it is shipped, executed, or masks a shipped bug.
Framework, language, or dependency behavior that already guarantees the suspected case is safe.
Hypothetical failures that require unrealistic inputs, impossible call order, or assumptions not supported by the code.

Finding Format

Title: name the exact bug and trigger.
Description: one short public comment stating the broken behavior and impact. Use a second sentence only if needed for the fix.
verification: write a short evidence trace with concrete code facts showing the trigger, intended contract, changed behavior, and checks that fail to exclude it. Use 2-5 bullets when helpful. Do not use checklist labels or restate the description.

Related Skills

getsentry/warden

development

VerifiedTrustedCommunity

Run Warden to analyze code changes before committing. Use when asked to "run warden", "check my changes", "review before commit", "warden config", "warden.toml", "create a warden skill", "add trigger", or any Warden-related local development task.

332SKILL.mdUpdated Apr 5, 2026

getsentry/security-review

development

VerifiedTrustedCommunity

Finds exploitable application security vulnerabilities in code changes. Use for Warden security scans, appsec review, OWASP-style checks, authentication or authorization bugs, injection, XSS, SSRF, path traversal, secrets, unsafe crypto, webhook verification, open redirects, or sensitive data exposure.

330SKILL.mdUpdated May 28, 2026

getsentry/security-review

getsentry/warden-sweep

development

VerifiedTrustedCommunity

Full-repository code sweep. Scans every file with Warden, verifies findings through deep tracing, creates draft PRs for validated issues. Use when asked to "sweep the repo", "scan everything", "find all bugs", "full codebase review", "batch code analysis", or run Warden across the entire repository.

330SKILL.mdUpdated Apr 5, 2026

getsentry/warden-sweep

getsentry/skill-writer

testing

VerifiedTrustedCommunity

Create, synthesize, and iteratively improve agent skills following the Agent Skills specification. Use when asked to "create a skill", "write a skill", "synthesize sources into a skill", "improve a skill from positive/negative examples", "update a skill", or "maintain skill docs and registration". Handles source capture, depth gates, authoring, registration, and validation.

318SKILL.mdUpdated May 28, 2026

getsentry/skill-writer

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/getsentry/warden.git

# Copy into Claude Code skills folder (global)
cp -r warden/packages/warden/src/builtin-skills/code-review ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

getsentry/warden

330 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT