Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

not-matthias/bug-finder

Name: bug-finder
Author: not-matthias

modules/home/programs/cli-agents/shared/skills/bug-finder/SKILL.md

npx skillsauth add not-matthias/dotfiles-nix bug-finder

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Bug Finder

Find bugs. Verify correctness. Ignore style.

Iron law: never report a bug without a concrete trigger, a real failure mode, and a severe consequence.

Hard cap: 15 findings max per review. If you find more, keep only the highest severity. Inflation erodes trust.

When to Use

User asks to review a PR, diff, or specific code
User asks to check code for bugs or correctness
Before merging or shipping a change

What to Look For

Hunt these (ordered by severity):

Correctness — Logic errors, off-by-ones, wrong comparisons, unreachable code, broken invariants, incorrect state transitions
Security — Injection, auth bypass, data exposure, unsafe deserialization, missing input validation at system boundaries, secret leakage
Concurrency — Race conditions, deadlocks, missing synchronization, shared mutable state without protection
Error handling — Silent failures, swallowed exceptions, missing error propagation, unhandled edge cases that crash at runtime
Resource management — Leaks (memory, file handles, connections), missing cleanup, unbounded growth
API misuse — Wrong function for the job, violated contracts, deprecated calls with known bugs, incorrect argument ordering
Performance — Only when it causes real problems: algorithmic blowup, N+1 queries, hot-path allocations, unbounded retries

Ignore these (they belong to linters and minimal-diff):

Style, formatting, whitespace
Naming preferences
Code structure opinions
Missing comments or docs
Import ordering
"I would have done it differently" without a concrete failure scenario

Workflow

Step 1: Gather Context

Adapt to what the user gives you:

PR: Read the full diff via gh pr diff. Read the PR description. Identify what the change is supposed to do.
Staged changes: Run git diff --staged (or git diff for unstaged). Understand the intent from recent commits or user description.
Pointed code: Read the file(s) the user indicated. Ask what the code is supposed to do if unclear.

Before hunting, identify the critical paths — code that handles auth, payments, data writes, user input, or concurrency. These get the most scrutiny.

Scope rule for diff-based reviews: only flag issues in changed or new code. Pre-existing issues in untouched code are out of scope unless the change makes them reachable in a new way.

Step 2: Hunt

Walk through the code change systematically. For each issue found, record:

Exact location (file:line)
What's wrong
A concrete failure scenario: specific input, state, or sequence of events that triggers the bug

The failure scenario is mandatory. If you cannot construct one, the finding is not real — drop it.

Boundary analysis:

What happens at boundaries? (empty input, max values, nil/null/None, zero-length, unicode)
What happens under concurrency? (two requests at once, interrupted operations)
What happens on failure? (network down, disk full, permission denied, timeout)
What assumptions does this code make? Are they guaranteed by callers?
Does the error handling actually handle the error, or does it just log and continue?

Feynman technique — for each non-obvious function or block, ask:

Why does this line exist? If you can't explain it, that's where bugs hide.
What if this line were removed? Would anything break? Would the bug surface?
What if this check didn't exist? What bad state would flow through?
Does this function's assumption hold across all callers? Trace backwards.

Flow-divergence analysis — look for two code paths that should behave identically but don't. Check: do both paths handle errors the same way? Do both paths update the same state? If one path has a guard, should the other?

For language-specific pitfalls, consult the references:

Rust: references/rust.md
TypeScript: references/typescript.md
Python: references/python.md
Nix: references/nix.md

Step 3: Skeptic Pass

Re-examine every finding. For each one, ask:

Can I actually trigger this? Trace the call chain. Is the bad input reachable? Is the race condition possible given the actual usage pattern?
Does the surrounding code already prevent this? Check for guards, validation, or constraints upstream that make the scenario impossible.
Am I assuming something about the runtime/framework that isn't true? (e.g., "this isn't thread-safe" when the framework guarantees single-threaded execution)
Is this a real bug or a theoretical concern? If it requires an attacker with root access or a cosmic ray to trigger, drop it.

Common false-positive anti-patterns — drop findings that match these:

"Path A does X, path B doesn't" — check if the context makes X unnecessary on path B.
"This isn't thread-safe" — check if the framework/runtime guarantees single-threaded execution.
"Missing null check" — check if the type system or upstream validation already prevents null.
"Resource leak" — check if an RAII guard, defer, or context manager already handles cleanup.
"Unbounded input" — check if the caller/transport layer enforces limits before this code runs.

Drop any finding that fails the skeptic check. False positives waste the user's time and erode trust.

Step 4: Classify

Assign severity to surviving findings:

| Severity | Meaning | Examples | |----------|---------|---------| | CRITICAL | Will break in production. Data loss, security breach, crash. | SQL injection, use-after-free, auth bypass | | HIGH | Likely to break under realistic conditions. | Race condition on shared state, missing null check on user input | | MEDIUM | Breaks under edge cases or degrades correctness. | Off-by-one in pagination, resource leak on error path | | LOW | Unlikely to cause user-visible issues but still wrong. | Redundant check, minor perf issue on cold path |

Step 5: Report

Output findings sorted by severity (worst first). One line per finding:

file:line — [CRITICAL] Description. Breaks when: scenario.
file:line — [HIGH] Description. Breaks when: scenario.

End with a summary line:

N findings: X critical, Y high, Z medium, W low

If zero findings survive the skeptic pass, say so:

No issues found.

Rules

Every finding MUST have a concrete failure scenario. No scenario = not a real finding.
Do not pad the report. Zero findings is a valid outcome.
Do not suggest refactors, style changes, or "improvements" unless they fix a bug.
Do not flag code for being "complex" unless the complexity causes an actual defect.
When unsure if something is a bug, say so explicitly rather than presenting it as certain.
Failure scenarios should be specific enough to turn into a test case.

Resources

references/rust.md — Rust-specific pitfalls (unsafe, lifetimes, Send/Sync, unwrap)
references/typescript.md — TypeScript pitfalls (any casts, null coercion, async errors)
references/python.md — Python pitfalls (mutable defaults, GIL, exception handling)
references/nix.md — Nix pitfalls (lazy eval, infinite recursion, override gotchas)

Common Issues

Issue: Too many findings — review feels noisy.

Re-run the skeptic pass more aggressively. Group related findings under one root cause.

Issue: User disagrees with a finding.

Accept it. Do not argue. The user has context you don't.

Issue: Not enough context to judge correctness.

Ask the user what the code is supposed to do. Don't guess intent.

not-matthias/bug-finder

modules/home/programs/cli-agents/shared/skills/bug-finder/SKILL.md

Find bugs and verify correctness in code changes. Use when reviewing a PR, staged diff, or specific code the user points at. Hunts for logic errors, security flaws, race conditions, resource leaks, and API misuse. Produces evidence-based findings with concrete failure scenarios. Does NOT flag style, formatting, or naming issues.

9 stars

development

Updated Apr 20, 2026

$ install --global

skillsauth

npx skillsauth add not-matthias/dotfiles-nix bug-finder

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 20, 2026, 6:30 AM9.6s5 files scanned

SKILL.md

name:: bug-finder
description:: Find bugs and verify correctness in code changes. Use when

Bug Finder

Find bugs. Verify correctness. Ignore style.

Iron law: never report a bug without a concrete trigger, a real failure mode, and a severe consequence.

Hard cap: 15 findings max per review. If you find more, keep only the highest severity. Inflation erodes trust.

When to Use

User asks to review a PR, diff, or specific code
User asks to check code for bugs or correctness
Before merging or shipping a change

What to Look For

Hunt these (ordered by severity):

Correctness — Logic errors, off-by-ones, wrong comparisons, unreachable code, broken invariants, incorrect state transitions
Security — Injection, auth bypass, data exposure, unsafe deserialization, missing input validation at system boundaries, secret leakage
Concurrency — Race conditions, deadlocks, missing synchronization, shared mutable state without protection
Error handling — Silent failures, swallowed exceptions, missing error propagation, unhandled edge cases that crash at runtime
Resource management — Leaks (memory, file handles, connections), missing cleanup, unbounded growth
API misuse — Wrong function for the job, violated contracts, deprecated calls with known bugs, incorrect argument ordering
Performance — Only when it causes real problems: algorithmic blowup, N+1 queries, hot-path allocations, unbounded retries

Ignore these (they belong to linters and minimal-diff):

Style, formatting, whitespace
Naming preferences
Code structure opinions
Missing comments or docs
Import ordering
"I would have done it differently" without a concrete failure scenario

Workflow

Step 1: Gather Context

Adapt to what the user gives you:

PR: Read the full diff via gh pr diff. Read the PR description. Identify what the change is supposed to do.
Staged changes: Run git diff --staged (or git diff for unstaged). Understand the intent from recent commits or user description.
Pointed code: Read the file(s) the user indicated. Ask what the code is supposed to do if unclear.

Before hunting, identify the critical paths — code that handles auth, payments, data writes, user input, or concurrency. These get the most scrutiny.

Scope rule for diff-based reviews: only flag issues in changed or new code. Pre-existing issues in untouched code are out of scope unless the change makes them reachable in a new way.

Step 2: Hunt

Walk through the code change systematically. For each issue found, record:

Exact location (file:line)
What's wrong
A concrete failure scenario: specific input, state, or sequence of events that triggers the bug

The failure scenario is mandatory. If you cannot construct one, the finding is not real — drop it.

Boundary analysis:

What happens at boundaries? (empty input, max values, nil/null/None, zero-length, unicode)
What happens under concurrency? (two requests at once, interrupted operations)
What happens on failure? (network down, disk full, permission denied, timeout)
What assumptions does this code make? Are they guaranteed by callers?
Does the error handling actually handle the error, or does it just log and continue?

Feynman technique — for each non-obvious function or block, ask:

Why does this line exist? If you can't explain it, that's where bugs hide.
What if this line were removed? Would anything break? Would the bug surface?
What if this check didn't exist? What bad state would flow through?
Does this function's assumption hold across all callers? Trace backwards.

For language-specific pitfalls, consult the references:

Rust: references/rust.md
TypeScript: references/typescript.md
Python: references/python.md
Nix: references/nix.md

Step 3: Skeptic Pass

Re-examine every finding. For each one, ask:

Can I actually trigger this? Trace the call chain. Is the bad input reachable? Is the race condition possible given the actual usage pattern?
Does the surrounding code already prevent this? Check for guards, validation, or constraints upstream that make the scenario impossible.
Am I assuming something about the runtime/framework that isn't true? (e.g., "this isn't thread-safe" when the framework guarantees single-threaded execution)
Is this a real bug or a theoretical concern? If it requires an attacker with root access or a cosmic ray to trigger, drop it.

Common false-positive anti-patterns — drop findings that match these:

"Path A does X, path B doesn't" — check if the context makes X unnecessary on path B.
"This isn't thread-safe" — check if the framework/runtime guarantees single-threaded execution.
"Missing null check" — check if the type system or upstream validation already prevents null.
"Resource leak" — check if an RAII guard, defer, or context manager already handles cleanup.
"Unbounded input" — check if the caller/transport layer enforces limits before this code runs.

Drop any finding that fails the skeptic check. False positives waste the user's time and erode trust.

Step 4: Classify

Assign severity to surviving findings:

Step 5: Report

Output findings sorted by severity (worst first). One line per finding:

file:line — [CRITICAL] Description. Breaks when: scenario.
file:line — [HIGH] Description. Breaks when: scenario.

End with a summary line:

N findings: X critical, Y high, Z medium, W low

If zero findings survive the skeptic pass, say so:

No issues found.

Rules

Every finding MUST have a concrete failure scenario. No scenario = not a real finding.
Do not pad the report. Zero findings is a valid outcome.
Do not suggest refactors, style changes, or "improvements" unless they fix a bug.
Do not flag code for being "complex" unless the complexity causes an actual defect.
When unsure if something is a bug, say so explicitly rather than presenting it as certain.
Failure scenarios should be specific enough to turn into a test case.

Resources

references/rust.md — Rust-specific pitfalls (unsafe, lifetimes, Send/Sync, unwrap)
references/typescript.md — TypeScript pitfalls (any casts, null coercion, async errors)
references/python.md — Python pitfalls (mutable defaults, GIL, exception handling)
references/nix.md — Nix pitfalls (lazy eval, infinite recursion, override gotchas)

Common Issues

Issue: Too many findings — review feels noisy.

Re-run the skeptic pass more aggressively. Group related findings under one root cause.

Issue: User disagrees with a finding.

Accept it. Do not argue. The user has context you don't.

Issue: Not enough context to judge correctness.

Ask the user what the code is supposed to do. Don't guess intent.

Related Skills

not-matthias/personal-notes

documentation

VerifiedTrustedCommunity

Save notes, journal entries, and research to the personal-notes Obsidian vault (personal-vault-v2). Use when the user asks to 'save note', 'save to notes', 'write to personal notes', 'save to daily notes', 'note this down', or wants to persist findings/analysis to their personal vault.

9SKILL.mdUpdated Jul 29, 2026

not-matthias/personal-notes

not-matthias/fix-pr

documentation

VerifiedTrustedCommunity

Use whenever the user asks to address, fix, resolve, review, or respond to pull-request comments or review feedback.

9SKILL.mdUpdated Jul 29, 2026

not-matthias/code-style

development

VerifiedTrustedCommunity

Apply Not Matthias's Rust-first personal coding style. Use whenever the user explicitly asks to apply or review their code style, make Rust match their preferences, perform a style pass, or simplify/refactor according to their conventions. Inspect only task-touched code, honor local project conventions first, and make only safe opt-out style edits.

9SKILL.mdUpdated Jul 29, 2026

not-matthias/code-style

not-matthias/ast-grep

development

VerifiedTrustedCommunity

Guide for writing ast-grep rules to perform structural code search and analysis. Use when users need to search codebases using Abstract Syntax Tree (AST) patterns, find specific code structures, or perform complex code queries that go beyond simple text search. This skill should be used when users ask to search for code patterns, find specific language constructs, or locate code with particular structural characteristics.

9SKILL.mdUpdated Jul 29, 2026

not-matthias/ast-grep

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/not-matthias/dotfiles-nix.git

# Copy into Claude Code skills folder (global)
cp -r dotfiles-nix/modules/home/programs/cli-agents/shared/skills/bug-finder ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

not-matthias/dotfiles-nix

9 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT