codex/skills/multi-pass-bug-hunting/SKILL.md
Systematic audit-fix-rescan cycle for comprehensive bug elimination. Use when code review, deep audit, "find all bugs", or pre-release hardening.
npx skillsauth add tkersey/dotfiles multi-pass-bug-huntingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Core Insight: First pass finds obvious bugs. Second pass finds bugs hidden by the obvious ones. Third pass catches what you introduced fixing the first two.
Single-pass reviews miss bugs. Fixing bug A exposes bug B. Fixing B introduces C. You need an iterative cycle that converges to clean.
I want you to sort of randomly explore the code files in this project, choosing
code files to deeply investigate and trace their functionality through related
files. Do a super careful methodical check with fresh eyes to find any bugs,
problems, errors, issues, silly mistakes, etc. and fix them.
Comply with ALL rules in AGENTS.md. Use ultrathink.
Run a multi-pass bug hunt on this codebase:
Pass 1 (Surface): Run ubs, fix real bugs, ignore noise
Pass 2 (Deep): Re-read fixed files with fresh eyes, check edge cases
Pass 3 (Integration): Check how fixes interact, run tests
Pass 4 (Verify): Final ubs scan, must be clean
Between passes: Document what you found and fixed.
Stop when: Clean scan + tests pass + no new findings on re-read.
The "fresh eyes" technique is essential for catching bugs:
After fixing code → Step back → Re-read with fresh perspective
Why it works:
Trigger phrases:
┌─────────────────────────────────────────────────────┐
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ AUDIT │───▶│ FIX │───▶│ RESCAN │────┼──▶ Clean? ──▶ Done
│ └──────────┘ └──────────┘ └──────────┘ │ │
│ ▲ │ │ No
│ │ │ ▼
│ └──────────────────────────────────────────┼───────┘
│ │
└─────────────────────────────────────────────────────┘
# Pass 1: Surface scan
ubs . --format=jsonl > pass1.jsonl
# Triage, fix real bugs, suppress false positives with justification
# Pass 2: Rescan + fresh eyes review
ubs . --format=jsonl > pass2.jsonl
git diff --name-only HEAD~1 # Files you touched
# Re-read EACH fixed file with fresh eyes - what did you miss?
# Pass 3: Integration check
cargo test --all # or npm test, pytest, etc.
git diff HEAD~1 | head -200 # Review your changes holistically
# Pass 4: Final verification
ubs . --fail-on-warning
# Must exit 0
ubs . --format=jsonl > findings.jsonl
# Triage each finding:
# - Real bug → Fix immediately
# - False positive → ubs:ignore with justification
# - Not sure → Mark for Pass 2
Focus: Null checks, missing await, resource leaks, obvious security
Bug types caught:
Re-read every file you touched. This is the critical pass.
Questions to ask:
# Find all files modified in Pass 1
git diff --name-only HEAD~1
# For each file: read it completely, trace through imports
# Look for: logic errors, edge cases, error handling, off-by-one
Focus: Logic errors, edge cases, incomplete error handling
# Run full test suite
cargo test --all
npm test
pytest -v
# Check for regressions
git diff HEAD~1 | head -200 # Review your changes holistically
# Look for:
# - Tests that started failing
# - New warnings in compilation
# - Behavior changes in dependent code
Focus: Do fixes interact badly? Did tests break? New warnings?
# Final clean scan
ubs . --fail-on-warning
# MUST exit 0. If not, back to Pass 1.
| Pass | What You Catch | Technique | |------|----------------|-----------| | 1 | Null derefs, missing await, resource leaks, injection | Automated scanners (ubs, clippy, eslint) | | 2 | Logic errors, edge cases, incomplete error handling | Fresh eyes manual review | | 3 | Regressions, integration issues, test failures | Test suite + diff review | | 4 | Anything introduced in passes 1-3 | Final automated verification |
After each pass, record what you found:
## Pass N Findings
**Files touched:** src/foo.rs, src/bar.rs
**Bugs fixed:**
- src/foo.rs:42 — Null deref on empty input
- src/bar.rs:100 — Missing await on async call
**False positives suppressed:**
- src/foo.rs:55 — ubs:ignore, validated by caller
**Deferred to Pass N+1:**
- Check if similar pattern exists in src/baz.rs
**Confidence:** High / Medium / Low
Stop when ALL of these are true:
ubs . --fail-on-warning exits 0When reviewing code written by other agents:
Review the code written by your fellow agents and check for any issues, bugs,
errors, problems, inefficiencies, security problems, reliability issues, etc.
and carefully diagnose their underlying root causes using first-principle
analysis and then fix or revise them if necessary.
Do not restrict yourself to the latest commits, cast a wider net and go super deep.
Why this matters:
# UBS (primary scanner)
ubs . --format=jsonl # Detailed findings
ubs . --comparison=baseline.json # New issues only
ubs --staged # Pre-commit check
# Clippy (Rust)
cargo clippy -- -D warnings 2>&1 | head -50
# ESLint (TypeScript)
npx eslint . --format=json
# Ruff (Python)
ruff check . --output-format=json
# ShellCheck (Bash)
shellcheck *.sh
| Don't | Do | |-------|-----| | Fix everything in one pass | Iterate: scan → fix → rescan | | Skip rescan after fixes | Always rescan — fixes introduce bugs | | Skip fresh eyes review | Pass 2 is where you catch logic errors | | Suppress without justification | Every ubs:ignore needs a reason | | Trust green tests alone | Tests + static analysis + manual review | | Stop at "good enough" | Converge to actually clean | | Review only latest commit | Cast a wider net, go deep |
| Situation | Passes Needed | |-----------|---------------| | Quick pre-commit | 1 (ubs --staged) | | Feature complete | 2-3 | | Pre-release | 3-4 | | Security audit | 4+ with security focus | | Legacy code cleanup | 4+ until convergence | | Reviewing agent code | 2-3 minimum, fresh eyes critical |
| Need | File | |------|------| | Triage decision tree | TRIAGE.md | | Common bug patterns | PATTERNS.md | | Tool-specific commands | TOOLS.md |
tools
Convert markdown plans into beads with dependencies using br CLI. Use when creating task graphs, polishing beads before implementation, or bridging planning to agent swarm execution.
development
Orchestrate Codex skill optimization during active sessions through $cas goal control, $shadow single-session evidence, $tune diagnosis/refinement briefs, and the skill-optimizer custom subagent. Trigger for $opt, skill optimization loops, session-driven skill tuning, meta-skill audits, or explicit validated skill edits. Do not use for general code optimization, product optimization, or performance tuning.
development
Run a targeted fresh-eyes blunder pass over code, specs, plans, adjudications, closure gates, skill edits, or negative-evidence ledgers. Trigger when asked to reread with fresh eyes, find obvious bugs, catch mistakes/oversights/omissions, check for embarrassing misses, or perform a second independent blunder pass before closure. Do not use as a substitute for implementation, adjudication, or verification; use it as the final falsification/check pass for those workflows.
development
Explicitly shadow, tail, watch, follow, monitor, supervise, or companion exactly one Codex session id/path through `$seq`, then apply a named target skill as an interpretation/reporting/proposal/action lens until the watched session stops.