skills/adversarial-review/SKILL.md
Adversarial deep code review — orchestrator spawns 8 specialist agents in parallel hunting spec drift, comment lies, memory hazards, concurrency bugs, contract gaps, code quality issues, and structural over-abstraction. Outputs severity-ranked findings to .bob/state/review.md with a routing recommendation.
npx skillsauth add mattdurham/bob bob:adversarial-reviewInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are the orchestrator for a hostile, adversarial code review. You spawn eight independent team agents — each with a narrow, deep mandate — and wait for their findings. You then consolidate results into a severity-ranked report in .bob/state/review.md.
You are a pure orchestrator:
Default assumption: every file contains at least one issue. The job of each agent is to disprove that assumption — not to confirm the code is fine. When in doubt, flag it. False positives are cheaper than missed bugs.
Scope: If invoked with DIFF or diff argument, review only the changed files. Otherwise review the entire codebase.
Test generation: If invoked with TEST or test argument, a Phase 5 runs after consolidation — a test-writer agent creates failing unit tests for every CRITICAL and HIGH finding where a test is feasible. Off by default. Combinable with DIFF: /bob:adversarial-review DIFF TEST.
INIT/SCOPE
│
▼
SPAWN 8 TEAM AGENTS IN PARALLEL
├── team-1: Spec Vigilante (spec↔code alignment)
├── team-2: Comment Assassin (comment accuracy + simplification)
├── team-3: Memory & Panic Hunter (overflow, pool misuse, nil deref)
├── team-4: Concurrency Hawk (races, goroutine bounds, error swallowing)
├── team-5: Contract & Test Sheriff (API surface, test correctness)
├── team-6: Bug Finder (logic errors, resource leaks, edge cases)
├── team-7: Code Quality Auditor (magic numbers, complexity, idiomatic Go)
└── team-8: Architecture Introspector (unnecessary abstractions, structural cleanup)
│
▼
WAIT (all 8 must complete)
│
▼
CONSOLIDATE → .bob/state/review.md
│
▼ (only if TEST argument passed)
WRITE TESTS → failing unit tests for CRITICAL/HIGH findings
Actions (run these yourself, do not delegate):
Detect arguments — check what was passed:
DIFF or diff → scope to changed files onlyTEST or test → enable test generation phase (Phase 5) after consolidationDIFF TESTDetermine scope — check if DIFF was passed or if called from /bob:work adversarial mode:
# If DIFF mode: get changed files
git diff --name-only $(git merge-base HEAD main)..HEAD
# Otherwise: enumerate entire codebase
find . -name '*.go' -not -path './.git/*' | sort
find . \( -name 'SPECS.md' -o -name 'NOTES.md' -o -name 'TESTS.md' -o -name 'BENCHMARKS.md' -o -name 'CLAUDE.md' \) -not -path './.git/*' | sort
git log --oneline -5
Create the review directory:
mkdir -p .bob/review
Write .bob/review/scope.md containing:
Start all eight simultaneously. Do NOT wait for one before starting the next.
Mandate: Code↔spec alignment. Every spec reference in a comment is a claim. Verify it.
Agent(
subagent_type: "team-analyst",
description: "Spec alignment audit",
run_in_background: true,
prompt: """
You are a hostile spec auditor. Assume every spec reference is wrong until proven otherwise.
Working directory: [repo root]
Read .bob/review/scope.md for the file list and hot zones.
FIRST: Check if this repo uses spec-driven modules:
```bash
find . \( -name 'SPECS.md' -o -name 'NOTES.md' -o -name 'TESTS.md' -o -name 'BENCHMARKS.md' \) \
-not -path './.git/*' | head -5
```
IF NO SPEC FILES FOUND: Your mandate shifts — skip all spec ID checks.
Instead focus ONLY on:
- Doc comments that are factually wrong about what the function does
- Invariant claims in comments that the code violates
- Function docs that describe behavior the code doesn't implement
Write ".bob/review/spec-vigilante.md" with a note at the top:
"No spec files detected — focused on doc comment accuracy instead."
IF SPEC FILES FOUND: proceed with full spec audit below.
For every spec/note/test/bench ID or invariant claim found in code comments:
1. EXISTENCE: Does this ID/claim actually exist in the spec files?
Invented IDs create false review confidence. → CRITICAL
2. ACCURACY MATCH: Does what the spec entry says match what the annotated code does?
A tag on code where the rationale no longer holds. → HIGH
3. BACK-REF ACCURACY: For spec entries with back-references to functions, does that
function still exist? Use Grep to confirm. Renamed functions leave stale back-refs. → HIGH
4. SPEC CURRENCY: Has code behavior changed without the corresponding spec being updated?
Look for divergence between spec prose and what the code actually does. → HIGH
5. MISSING SPEC FILES: If spec-driven modules exist (SPECS.md pattern), every module
in scope should have its spec docs. Flag any missing. → MEDIUM
6. TESTS.md ALIGNMENT: For functions that have spec entries, do described test cases
match real test functions (use Grep)? → HIGH
If no spec files exist in the repo, focus on: doc comments that are factually wrong,
invariant claims in comments that the code violates, and function docs that describe
behavior the code doesn't implement.
Output format per finding: file:line, claim vs. reality, severity (CRITICAL/HIGH/MEDIUM/LOW).
Write all findings to .bob/review/spec-vigilante.md.
Header: "Found N issues (CRITICAL: X, HIGH: Y, MEDIUM: Z, LOW: W)"
"""
)
Mandate: Comments are claims. Verify every claim. Flag lies, stale explanations, and needless verbosity.
Agent(
subagent_type: "team-analyst",
description: "Comment accuracy and simplification audit",
run_in_background: true,
prompt: """
You are a hostile comment auditor. Default assumption: comments lie.
Working directory: [repo root]
Read .bob/review/scope.md for the file list and hot zones.
Read every comment in every file in scope. For each comment, verify it is true.
1. RETURN VALUE MISMATCH: Comment describes a different return signature than the
function actually has. Read the signature and verify. → HIGH
2. ALGORITHM DRIFT: Comment says the function uses algorithm X but code implements Y.
Must describe the CURRENT algorithm. → HIGH
3. PERFORMANCE CLAIM MATH: "produces ~N results" when the math doesn't add up.
Do the arithmetic and verify. → MEDIUM
4. COPY LIE: Comment says "copied at parse time" but code stores a sub-slice
(retains the backing buffer). A sub-slice is not a copy. → HIGH
5. DEAD COMMENTS: Comments describing code paths, variables, or behaviors that no
longer exist. "See the X stage below" when there is no X stage. → MEDIUM
6. ZOMBIE TAGS: Tags or annotations copy-pasted from an old function into a new
context where the rationale no longer applies. → MEDIUM
7. OVERLY VERBOSE COMMENTS: Multi-line block comments explaining what the code
obviously does at the statement level. → LOW
8. CONDITION COMPLEXITY: `if` conditions with more than 3 boolean operands not
extracted into a named predicate. → MEDIUM
9. UNNECESSARY STRUCT: A struct with a single method and no state that should be
a plain function or function type. → MEDIUM
10. TODO/FIXME WITHOUT TRACKING: Bare "TODO: fix this" with no issue link. → LOW
Output format per finding: file:line, what it claims vs. what's true, severity.
Write findings to .bob/review/comment-assassin.md.
Header: "Found N issues (CRITICAL: X, HIGH: Y, MEDIUM: Z, LOW: W)"
"""
)
Mandate: Integer overflows, pool misuse, nil derefs, panic paths — across every file in scope.
Agent(
subagent_type: "bug-finder",
description: "Memory safety, integer overflow, and panic path audit",
run_in_background: true,
prompt: """
You are a hostile memory and panic safety auditor.
Working directory: [repo root]
Read .bob/review/scope.md for the file list and hot zones.
1. INTEGER OVERFLOW FROM EXTERNAL INPUT (CRITICAL):
`pos + count*8` where count comes from a parsed file/network byte — overflows int
on 32-bit. Validate against math.MaxInt before cast. → CRITICAL
2. MULTIPLICATION OVERFLOW IN SIZE CALCULATIONS (CRITICAL):
Arithmetic involving multiple factors read from external input that can overflow.
Use uint64 for intermediate arithmetic, validate, then convert. → CRITICAL
3. ZERO-LENGTH GUARD MISSING (HIGH):
Any length read from external input used without validating > 0. A zero length
in a loop can cause infinite loops or index out-of-bounds. → HIGH
4. POOL PUT WHILE REFS LIVE (CRITICAL):
`pool.Put(x)` while any field of x is still referenced by another variable
that will be read later. A concurrent Get+reset can corrupt those fields. → CRITICAL
5. BARE TYPE ASSERTIONS (HIGH):
`x.(T)` (not the two-value form) on any value from an external source or
crossing a package boundary. Must be `v, ok := x.(T)`. → HIGH
6. NIL DEREF RISK (HIGH):
Map lookup result used without ok-check. Interface value used without nil check.
Function result that can return nil dereferenced without checking. → HIGH
7. GOROUTINE PANIC PROPAGATION (HIGH):
Goroutines launched without a `defer recover()`. If the goroutine panics,
it kills the whole process. Any `go func()` calling non-trivial code. → HIGH
8. SLICE-INTO-RETAINED-BUFFER (HIGH):
Code stores `data[pos:pos+N]` (a sub-slice) but a comment claims it was copied.
Sub-slices keep the whole backing buffer alive. → HIGH
9. RESOURCE LEAKS (HIGH):
Files, connections, or other resources opened without `defer Close()`.
goroutines that can block forever with no timeout or cancel path. → HIGH
10. PREALLOCATE DISCIPLINE (MEDIUM):
`make([]T, 0)` or `[]T{}` inside hot loops without a capacity hint.
`make(map[K]V)` without capacity when size is predictable. → MEDIUM
Output per finding: file:line, description, why it's dangerous, severity.
Write findings to .bob/review/memory-panic-hunter.md.
Header: "Found N issues (CRITICAL: X, HIGH: Y, MEDIUM: Z, LOW: W)"
"""
)
Mandate: Races, unbounded goroutines, swallowed errors, and partial-state corruption.
Agent(
subagent_type: "go-presubmit-reviewer",
description: "Concurrency safety and error handling audit",
run_in_background: true,
prompt: """
You are a hostile concurrency and error safety reviewer.
Working directory: [repo root]
Read .bob/review/scope.md for the file list and hot zones.
## Concurrency:
1. LAZY MEMOIZATION RACE (CRITICAL):
`if r.cache == nil { r.cache = compute() }` on a shared field without sync.Once
is a data race under concurrent calls. Every lazy-initialized shared field must
use sync.Once or be protected by a mutex. → CRITICAL
2. UNBOUNDED GOROUTINE FAN-OUT (HIGH):
errgroup.Go() inside a loop without errgroup.SetLimit() or a semaphore.
Under large inputs this causes OOM. → HIGH
3. SHARED STATE WITHOUT MUTEX (CRITICAL):
Any map, slice, or struct field written by one goroutine and read by another
without a mutex or channel. → CRITICAL
4. CHANNEL DEADLOCK (HIGH):
Sends/receives on unbuffered channels where both sides aren't guaranteed to run.
Select statements missing a default or done case. → HIGH
## Error handling:
5. SWALLOWED ERRORS (CRITICAL):
`if err != nil { return nil }` without propagating, logging, or annotating.
`_ = someCall()` discarding an error without justification.
A swallowed error can turn a failure into a silent data corruption. → CRITICAL
6. PARTIAL STATE ON ERROR (HIGH):
Write path that fails midway must clear pending state before returning the error.
Partial state causes the next operation to double-apply updates. → HIGH
7. ERRORS NOT WRAPPED (MEDIUM):
`return err` without `fmt.Errorf("doing X: %w", err)` loses call-site context.
Callers can't distinguish between error types or trace the root cause. → MEDIUM
8. CONTEXT NOT PROPAGATED (MEDIUM):
Functions that do I/O or long work without accepting or checking a context.Context.
No way to cancel or time out from the caller. → MEDIUM
Output per finding: file:line, description, invariant violated, severity.
Write findings to .bob/review/concurrency-hawk.md.
Header: "Found N issues (CRITICAL: X, HIGH: Y, MEDIUM: Z, LOW: W)"
"""
)
Mandate: API surface integrity, test name accuracy, regression coverage, test correctness.
Agent(
subagent_type: "team-analyst",
description: "API contract and test quality audit",
run_in_background: true,
prompt: """
You are a hostile API contract and test quality reviewer.
Working directory: [repo root]
Read .bob/review/scope.md for the file list and hot zones.
## API surface:
1. UNAUTHORIZED PUBLIC API EXPANSION (CRITICAL):
Any newly exported symbol that lacks a clear justification or doc comment.
The public API surface should grow intentionally. → CRITICAL
2. ZERO-VALUE OPTION SEMANTICS (MEDIUM):
Option struct fields where zero-value semantics are undocumented or inconsistent
between validation and defaults. → MEDIUM
3. VERSION VALIDATION TIMING (HIGH):
File/protocol version must be validated at open/parse time, not deep in execution.
A legacy input that proceeds to processing produces misleading errors. → HIGH
4. INTERFACE CONTRACT GAPS (HIGH):
Interface methods with no doc comment explaining pre/post-conditions.
Error return semantics not documented (when can it return nil? always?). → HIGH
## Test quality:
5. TEST NAMES THAT LIE (HIGH):
`TestZeroRead` that asserts result != nil. Test name implies one behavior,
body tests another. Read every test name and verify the body matches. → HIGH
6. REGRESSION TESTS THAT DON'T REACH THE REGRESSION (CRITICAL):
A test for a specific bug that returns before the guard is ever exercised.
For every test with "regression", "bug", or a bug ID: trace execution and verify
the regression code is actually reached. → CRITICAL
7. OOM/TIMEOUT RISK IN TESTS (HIGH):
`make([]T, math.MaxInt32)` in a unit test. These must be benchmarks or skipped
with testing.Short(). → HIGH
8. HEAVY MOCKING (MEDIUM):
Tests that mock interfaces when hitting the real implementation is practical.
Mocks that diverge from real behavior hide bugs. → MEDIUM
9. TEST ISOLATION (MEDIUM):
Tests that write to the filesystem without using t.TempDir(). Tests that share
mutable global state without cleanup. → MEDIUM
10. MISSING ERROR PATH TESTS (MEDIUM):
Functions that return errors with zero test coverage of the error path. → MEDIUM
Output per finding: file:line, description, severity.
Write findings to .bob/review/contract-test-sheriff.md.
Header: "Found N issues (CRITICAL: X, HIGH: Y, MEDIUM: Z, LOW: W)"
"""
)
Mandate: Logic errors, resource leaks, incorrect algorithms, off-by-ones, and edge cases the other agents won't catch.
Agent(
subagent_type: "bug-finder",
description: "Logic errors, edge cases, and resource leak audit",
run_in_background: true,
prompt: """
You are a hostile bug hunter. Your job: find bugs the code author didn't notice.
Working directory: [repo root]
Read .bob/review/scope.md for the file list and hot zones.
Hunt for:
1. OFF-BY-ONE ERRORS (HIGH):
Loop bounds using < vs <= incorrectly. Slice indices that can be len(s) when
len(s)-1 was intended. Range checks that miss the boundary value. → HIGH
2. INCORRECT ALGORITHM (CRITICAL):
The comment says one algorithm but the code implements another. An algorithm
that appears correct for typical inputs but fails at boundaries. → CRITICAL
3. EARLY RETURN THAT SKIPS CLEANUP (HIGH):
A return statement that bypasses a defer or a cleanup step that must always run.
`if err != nil { return }` before a resource release. → HIGH
4. LOGIC INVERSION (HIGH):
`if err == nil { return err }` — checking the wrong condition.
`if !shouldSkip { skip() }` — inverted boolean. These are easy to miss. → HIGH
5. ACCUMULATOR NOT RESET (HIGH):
A variable used to accumulate results across iterations that is not reset
between calls. Produces incorrect results on second invocation. → HIGH
6. SHADOWED ERROR (HIGH):
`err := foo(); if err := bar(); err != nil { ... }` — the inner err shadows
the outer and the outer error is silently lost. → HIGH
7. WRONG COMPARISON TYPE (MEDIUM):
Comparing a float to an exact integer when float arithmetic makes equality
unreliable. Comparing strings with == when case-insensitive was intended. → MEDIUM
8. MISSING DEFAULT CASE (MEDIUM):
switch on an enum or external value with no default — new values silently do nothing.
→ MEDIUM
9. SLICE APPEND ALIASING (HIGH):
`b = append(a, x)` where `a` and `b` share backing storage — modifying b can
corrupt a if cap(a) was not exhausted. → HIGH
10. INCORRECT USE OF time.Since / time.Until (LOW):
`time.Since(deadline) > 0` should be `time.Now().After(deadline)`.
Off-by-one in duration comparisons. → LOW
Output per finding: file:line, description, why it causes incorrect behavior, severity.
Write findings to .bob/review/bug-finder.md.
Header: "Found N issues (CRITICAL: X, HIGH: Y, MEDIUM: Z, LOW: W)"
"""
)
Mandate: Magic numbers, unnamed constants, cyclomatic complexity, repeated literals, and non-idiomatic Go.
Agent(
subagent_type: "workflow-code-quality",
description: "Magic numbers, cyclomatic complexity, and idiomatic Go audit",
run_in_background: true,
prompt: """
You are a hostile code quality auditor. Find every place where the code is harder
to read, maintain, or audit than it needs to be.
Working directory: [repo root]
Read .bob/review/scope.md for the file list and hot zones.
1. MAGIC NUMBERS IN SWITCH/CASE (HIGH):
switch x { case 1, 2, 6, 7: ... } where x is a value from external input
with no named constants for the values. Find ALL such switches. → HIGH
2. MAGIC BYTE OFFSETS (HIGH):
Arithmetic like `data[8:12]`, `pos += 4 + dictLen` with no named constants
for the field widths or offsets. → HIGH
3. CYCLOMATIC COMPLEXITY > 20 (HIGH):
Run: gocyclo -over 20 ./...
Functions >30 are CRITICAL. Functions 21–30 are HIGH. → HIGH/CRITICAL
4. REPEATED LITERAL VALUES WITHOUT CONST (MEDIUM):
The same numeric or string literal used in 3+ unrelated places without a const.
→ MEDIUM
5. IOTA ENUMS MISSING (MEDIUM):
A group of related const = 1, 2, 3 that should be a typed iota enum. → MEDIUM
6. NON-IDIOMATIC GO PATTERNS (MEDIUM):
- Error strings starting with capital letters or ending with punctuation
- Unnecessary else after return/continue/break
- Boolean parameters where caller sees f(true) with no context
- Named return values used inconsistently without reason
→ MEDIUM
7. DEAD EXPORTED SYMBOLS IN INTERNAL PACKAGES (MEDIUM):
Exported functions/types/vars in internal/ packages never called from outside
the defining package. Use Grep to check call sites. Should be unexported. → MEDIUM
8. INIT() WITH SIDE EFFECTS (LOW):
Any init() that does more than register a type or set a simple default —
I/O, goroutine launch, or panic risk. → LOW
9. PACKAGE-LEVEL VARS THAT SHOULD BE CONSTS (LOW):
`var x = "fixed string"` or `var x = 42` where the value never changes. → LOW
10. SWITCH WITHOUT EXHAUSTIVENESS COMMENT (LOW):
Large switch on a typed value with a default: case but no comment explaining
what values the default catches and whether new values need to be added. → LOW
For each finding: file:line, what the issue is, what it should be, severity.
Write findings to .bob/review/code-quality-auditor.md.
Header: "Found N issues (CRITICAL: X, HIGH: Y, MEDIUM: Z, LOW: W)"
"""
)
Mandate: Unnecessary abstractions, structural complexity, misplaced responsibilities.
Agent(
subagent_type: "architecture-introspector",
description: "Structural cleanup: unnecessary abstractions, over-engineered patterns",
run_in_background: true,
prompt: """
You are a hostile architecture auditor. Find every place where the structure is more
complex than the problem requires.
Working directory: [repo root]
Read .bob/review/scope.md for the file list and hot zones.
1. ZERO-FIELD STRUCTS WITH ONE DELEGATING METHOD (HIGH):
A struct with no fields and methods that purely call package-level functions.
These add heap allocation and a layer of indirection for zero benefit.
They should be plain package-level functions. → HIGH
2. INTERFACES WITH ONE IMPLEMENTATION (MEDIUM):
An interface defined and used with exactly one concrete type. Unless it sits
at a package boundary for testability, it's speculative abstraction.
Use Grep to count implementations of each interface. → MEDIUM
3. CONSTRUCTOR RETURNING UNEXPORTED TYPE (MEDIUM):
`func NewFoo() *foo` (lowercase type) — callers can't name the type.
Either export the type or return an interface. → MEDIUM
4. OPTION STRUCT DUPLICATION (MEDIUM):
Multiple option structs that carry the same fields (timestamps, limits, filters).
A shared type would eliminate drift between them. → MEDIUM
5. PACKAGES WITH TOO MANY RESPONSIBILITIES (MEDIUM):
A package containing clearly distinct subsystems without separation.
Makes it hard to find things. → MEDIUM
6. SYNC.ONCE PAIRS THAT SHOULD BE A GENERIC LAZY TYPE (LOW):
Pattern: `xOnce sync.Once` + `x *T` repeated many times. → LOW
7. LARGE SWITCH THAT SHOULD BE A DISPATCH TABLE (LOW):
switch with 8+ arms where each arm calls a different function with the same
signature. A function map would be cleaner. → LOW
8. FILES THAT DO TWO UNRELATED THINGS (LOW):
File name doesn't match content. Flag candidates for splitting. → LOW
9. DEAD CODE PATHS (MEDIUM):
Functions or branches that are unreachable given current callers.
Use Grep to find functions defined but never called. → MEDIUM
10. SINGLE-USE ABSTRACTION (LOW):
A helper function or interface called once, from one place, for a non-complex
operation. Three similar lines is better than a premature abstraction. → LOW
For each finding: file:line or package, the structural issue, what simpler looks like, severity.
Write findings to .bob/review/architecture-introspector.md.
Header: "Found N issues (CRITICAL: X, HIGH: Y, MEDIUM: Z, LOW: W)"
"""
)
Wait until all eight report files exist:
.bob/review/spec-vigilante.md.bob/review/comment-assassin.md.bob/review/memory-panic-hunter.md.bob/review/concurrency-hawk.md.bob/review/contract-test-sheriff.md.bob/review/bug-finder.md.bob/review/code-quality-auditor.md.bob/review/architecture-introspector.mdRead all eight once complete.
Sum severities across all eight agents.
Deduplicate: if two agents flagged the same file:line, merge (double-flagged = higher confidence).
Routing recommendation (required when called from /bob:work):
BRAINSTORM (need to re-think approach)EXECUTE (fixable without re-planning)COMMITWrite .bob/state/review.md with:
# Adversarial Review — [branch] — [date]
**Agents:** Spec Vigilante · Comment Assassin · Memory & Panic Hunter · Concurrency Hawk · Contract & Test Sheriff · Bug Finder · Code Quality Auditor · Architecture Introspector
**Total findings: N** | CRITICAL: X | HIGH: Y | MEDIUM: Z | LOW: W
**Routing recommendation: BRAINSTORM / EXECUTE / COMMIT**
---
## CRITICAL — Must Fix Before Merge
> **[Agent] `file.go:line` — Title**
> Explanation of the bug, which invariant it violates, and why it matters.
---
## HIGH — Should Fix Before Merge
> **[Agent] `file.go:line` — Title**
> Explanation.
---
## MEDIUM — Fix or Accept + Document
Grouped list with file:line and one-line description.
---
## LOW — Optional Cleanup
Grouped list.
---
## Clean Domains
Note any of the 8 domains where no issues were found.
Goal: For every CRITICAL and HIGH finding in .bob/state/review.md, write a failing unit test that would catch the bug — if a test is feasible for that finding type.
Skip Phase 5 entirely if the TEST argument was NOT passed.
Actions:
Spawn a single workflow-implementer agent with this prompt:
Agent(
subagent_type: "workflow-implementer",
description: "Write failing unit tests for CRITICAL/HIGH review findings",
run_in_background: true,
prompt: """
Read /[repo-root]/.bob/state/review.md. Extract every CRITICAL and HIGH finding.
For each finding, decide if a unit test is feasible:
FEASIBLE (write a test):
- Logic errors, off-by-ones, incorrect algorithms (bug-finder findings)
- Integer overflow / nil deref / panic paths (memory-panic-hunter findings)
- Race conditions on specific functions (concurrency-hawk findings)
- Wrong return values, incorrect behavior (any agent)
- Error paths not being returned (swallowed errors)
NOT FEASIBLE (skip, note why):
- Comment accuracy issues (comment-assassin) — no behavior to test
- Architecture / abstraction issues (architecture-introspector) — structural, not behavioral
- Spec doc drift (spec-vigilante) — documentation, not code behavior
- Magic numbers / naming issues (code-quality-auditor) — style, not correctness
- Issues in files that have no clear unit-testable entry point
For each feasible finding:
1. Read the file:line cited in the finding to understand the code
2. Identify the function or method that contains the bug
3. Find or create the appropriate `_test.go` file for that package
4. Write a table-driven or single test that:
- Is named `TestBug_<ShortDescription>` (e.g. TestBug_IntegerOverflowInSizeCalc)
- Has a comment citing the finding: `// Regression for: [agent] finding — [one-line description]`
- Sets up input that triggers the bug as described in the finding
- Asserts the CORRECT behavior (the test should FAIL against the current buggy code)
- Is a valid, compilable Go test (correct imports, correct test signature)
5. Do NOT fix the bug — only write the test. The test must fail on the current code.
After writing all tests:
- Run `go build ./...` to confirm tests at least compile
- Write a summary to .bob/review/test-writer.md:
- For each finding: TESTED or SKIPPED and why
- List of new test functions and their files
Working directory: [repo root]
"""
)
Wait for the agent to complete, then read .bob/review/test-writer.md and append a Test Coverage section to .bob/state/review.md:
---
## Test Coverage (generated)
| Finding | File | Test | Status |
|---------|------|------|--------|
| [agent] description | file:line | TestBug_Xxx | WRITTEN |
| [agent] description | file:line | — | SKIPPED: comment-only issue |
Output this table inline so the user can see which findings now have regression tests.
development
Team-based development workflow using experimental agent teams - INIT → WORKTREE → BRAINSTORM → PLAN → EXECUTE → REVIEW → COMPLETE
development
Implements code changes following plans and specifications
data-ai
Self-directed reviewer that claims completed tasks and reviews them incrementally
data-ai
Self-directed planner that claims a plan task (blocked by brainstorm), creates the implementation plan, and stays alive to answer questions from teammates