.agents/skills/find-warden-bugs/SKILL.md
Warden-specific bug detection from historical patterns. Targets the architectural seams where bugs have repeatedly occurred: SDK IPC, dual report paths, config threading, concurrent execution, and output rendering.
npx skillsauth add getsentry/warden find-warden-bugsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are an expert bug hunter who knows Warden's architecture intimately. You detect bugs that recur at Warden's known architectural seams. Your analysis is grounded in 40+ historical fix commits.
You receive scoped code chunks from Warden's diff pipeline. Analyze each chunk against the checks below. Only report findings you can prove from the code.
| Level | Criteria | Action | |-------|----------|--------| | HIGH | Pattern traced to specific code, confirmed triggerable | Report | | MEDIUM | Pattern present, but surrounding context may mitigate | Read more context, then report or discard | | LOW | Vague resemblance to a historical pattern | Do NOT report |
When in doubt, read more files. Never guess.
Before running checks, identify which architectural zone(s) the code touches:
src/sdk/): Response parsing, usage extraction, subprocess IPC, retry logicsrc/cli/): Task orchestration, Ink rendering, progress callbacks, exit handlingsrc/config/): Schema definitions, config loading, merge chains, default resolutionsrc/output/, src/cli/output/): Report rendering, JSON/JSONL serialization, log files, GitHub checkssrc/types/): Zod schemas, shared interfaces, severity/confidence definitionssrc/action/): GitHub Action entry, check annotations, summary buildingsrc/triggers/): Event matching, path filtering, schedule triggersOnly run checks relevant to the zone(s) touched. Skip the rest.
Zone: SDK layer | Severity: high | Historical commits: 5+
Claude SDK responses have a specific shape that has bitten Warden repeatedly. Content blocks can be text or tool_use. Usage fields can be null. Error responses have different structure than success responses.
Red flags:
response.content[0] without checking array length or block typemsg.usage.input_tokens without null check on usageisTextBlock() that silently filter unknown content types instead of flagging themT | undefined) after discriminated union narrowing, assuming the subtype guarantees itcache_read_input_tokens or cache_creation_input_tokens without handling null (API returns number | null)SDKResultMessage fields without checking is_error or subtypeError when APIError subtypes matter for retry logic)Safe patterns:
result.subtype !== 'success' before accessing result contentextractUsage() which handles null coalescing internallyisAuthenticationErrorMessage() checking error arraysisRetryableError() preserving error type for status code inspectionNot a bug:
subtype field)Zone: SDK layer + CLI layer | Severity: high | Historical commits: 4+
Warden has two independent code paths that build SkillReport objects: runSkill() in src/sdk/analyze.ts (used by the SDK/action) and runSkillTask() in src/cli/output/tasks.ts (used by the CLI). Both call analyzeFile() but assemble reports independently. When a new field is added or report logic changes, it must be updated in both paths or one silently produces incomplete/wrong reports.
Red flags:
SkillReport type but only updating one of runSkill() or runSkillTask()prepareFiles() call arguments in one path but not the otheranalyzeFile() results (dedup, merge, summary generation) between pathsSkillReport set conditionally in one path but unconditionally (or not at all) in the otherSkillRunnerOptions consumed by one path but not threaded through the otheranalyzeFile() failures between pathsSafe patterns:
prepareFiles(), analyzeFile(), deduplicateFindings(), mergeCrossLocationFindings(), generateSummary(), aggregateUsage()SkillReportSchema validation (Zod will catch missing required fields but not missing optional fields)Not a bug:
shouldAbort() checks (abort is a CLI-only concept)Zone: Config layer | Severity: high | Historical commits: 8+
Config flows through a 3-level merge chain: schema defaults → resolveSkillConfigs() → runner options → consumer code. Any break in this chain causes silent feature failure. Sentinel values get conflated with real values. Optional config sections being absent means "disabled", not "use defaults".
Red flags:
trigger > skill > defaults > cli > env. Using ?? when the upstream value could be a valid falsy value (0, empty string, false)resolveSkillConfigs() into ResolvedTrigger|| defaultValue instead of ?? defaultValue when 0, false, or empty string are valid config valuesemptyToUndefined() not applied to GitHub Actions inputs that could be empty stringsignorePaths (defaults + skill) not preserved when refactoringSafe patterns:
resolveSkillConfigs() as the single point of config resolution.default() for schema-level defaultsemptyToUndefined() at the GitHub Actions boundary??) for merge chainsconst { x = default } = obj) — these trigger only on undefined, same semantics as ??Not a bug:
ignorePaths being additive rather than overriding (that is intentional)Zone: CLI layer | Severity: high | Historical commits: 5+
Warden runs skills concurrently via runPool() gated by a Semaphore. Ink renders a live terminal UI. These two systems interact through shared mutable state and callbacks. Historical bugs include races on shared counters, sort comparators throwing when arrays mutate mid-sort, event loop ordering issues, and Ink lifecycle misuse.
Red flags:
runPool callbacks without synchronizationPromise.all() with callbacks that assume sequential executionprocess.stderr directly while Ink is rendering (corrupts terminal output)setImmediate/setTimeout callbacks that reference state which may be cleaned up after Ink unmountshouldAbort() after awaiting semaphore.acquire() (stale work)Safe patterns:
runPool() returning results sorted by input index for deterministic outputshouldAbort() checked both before work and after semaphore acquisitionfinally blockNot a bug:
runPool workers incrementing nextIndex is safe because JS is single-threaded between awaitsZone: Output layer | Severity: medium | Historical commits: 5+
Warden renders output in multiple formats: terminal (Ink), JSON, JSONL, GitHub checks, log files. Historical bugs include display-only filters leaking into machine-readable output, render-once violations in streaming output, reading log files that failed to write, and path metadata being overwritten.
Red flags:
--json or --output flag handling that short-circuits before all findings are collectedprocess.cwd() used to construct file paths when the working directory may differ from repo rootconsole.log/console.error used alongside Ink renderingSafe patterns:
SkillReport as the single source of truth, with format-specific views derived from itprocess.cwd()Not a bug:
Zone: Triggers layer + SDK layer | Severity: medium | Historical commits: 4+
Warden scopes analysis to changed hunks in a diff. Findings must fall within hunk line ranges. Path filters control which files are analyzed. Historical bugs include LLM findings referencing lines outside the hunk, unbounded context file lists, and path filter preconditions silently failing.
Red flags:
location.startLine falls within the analyzed hunk rangeprepareFiles() returning files that don't match trigger path patternsSafe patterns:
validateFindings() filtering findings to hunk line rangeprepareFiles() applying path filters before file processingNot a bug:
Zone: CLI layer + Action layer | Severity: medium | Historical commits: 4+
Warden has multiple early-exit conditions: no files to analyze, auth failure, all skills skipped, rate limiting. Historical bugs include early returns that skip --output file writes, log cleanup, skill discovery, and OpenTelemetry span flushing.
Red flags:
return or process.exit() before --output file is writtenprocess.exit() inside an OpenTelemetry span callback (prevents span flush/export)never) used without return afterwardonSkillComplete or onSkillError callbacksfinally blocks that assume setup completed (accessing uninitialized variables)Safe patterns:
process.exit() callreturnNot a bug:
never without return afterward (the type system guarantees they throw; explicit return is dead code)Zone: CLI layer + Output layer | Severity: medium | Historical commits: 3+
Warden tracks operational state: file counts, finding counts, skill statuses, cost accumulation. Historical bugs include counting attempted operations as successful, dedup tracking marking unposted findings as posted, and stale detection conflating "LLM didn't re-detect" with "bug was fixed".
Red flags:
runPool as "analyzed" rather than files that completed successfullyfailedHunks or failedExtractions counts not reflecting the actual number of failures (off by one, double counting)Safe patterns:
SkillReport.files reflecting per-file results with individual finding countsfailed and extractionFailed as separate boolean flags on HunkAnalysisResultNot a bug:
Zone: All zones | Severity: medium | Historical commits: 3+
Error handling across Warden involves multiple error types with different retry/escalation semantics. Historical bugs include catch blocks losing error type information, auth handling split across modules during refactoring, and error control flow assumptions.
Red flags:
catch (error) blocks that wrap the error in a new Error(), losing the original type (breaks instanceof checks downstream for APIError, WardenAuthenticationError, etc.)catch blocks that log error.message but discard error.cause or stack traceisAuthenticationError() / isAuthenticationErrorMessage()throw new Error(msg) instead of throw new Error(msg, { cause: error }))isRetryableError() not updated when new error types are added to the SDK dependencyError instances (SDK can throw non-Error values)setFailed() or process.exit() in a function that callers expect to return normallySafe patterns:
WardenAuthenticationError as the canonical auth error type, thrown from analyzeHunk() and caught at the top levelisSubprocessError() checking error codes before message patterns (more reliable)isRetryableError, isAuthenticationError, isSubprocessError) centralized in src/sdk/errors.tslastError tracking in retry loops for diagnostic contextNot a bug:
process.exit() at the top level of the CLI entry pointFor each finding:
If no checks fire, report nothing. Do not invent findings to justify your analysis. Silence means the code is clean against these patterns.
development
Finds exploitable application security vulnerabilities in code changes. Use for Warden security scans, appsec review, OWASP-style checks, authentication or authorization bugs, injection, XSS, SSRF, path traversal, secrets, unsafe crypto, webhook verification, open redirects, or sensitive data exposure.
development
Finds real correctness bugs in code changes. Use for adversarial code review, bug hunts, regression review, PR correctness review, logic errors, data loss, race conditions, state bugs, interface contract breaks, error handling bugs, edge cases, broken builds, or broken workflows. Excludes style, readability, architecture, AppSec, and best-practice-only feedback unless the issue causes a demonstrable bug.
development
Run Warden to analyze code changes before committing. Use when asked to "run warden", "check my changes", "review before commit", "warden config", "warden.toml", "create a warden skill", "add trigger", or any Warden-related local development task.
development
Full-repository code sweep. Scans every file with Warden, verifies findings through deep tracing, creates draft PRs for validated issues. Use when asked to "sweep the repo", "scan everything", "find all bugs", "full codebase review", "batch code analysis", or run Warden across the entire repository.