skills/code-forge-cycle/SKILL.md
Post-POC build loop with 5 specialized builder agents and dual independent audit gates. Turns scaffolded code into production-quality software through iterative build-then-verify cycles.
npx skillsauth add ComputerConnection/zach-pack code-forge-cycleInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Turn scaffolded code into production-quality software: $ARGUMENTS
HARD GATE: This skill BUILDS code. It does not just audit or plan — it spawns specialized builder agents that make real changes, then validates with two independent auditors. Each cycle: Build → Build → Build → Verify → Verify.
/poc scaffolds the skeleton
/code-forge-cycle builds it into a real product
The current quality loop is: audit → find problems → fix problems → audit (find more problems)
This skill inverts it: build → build → build → verify (confirm quality)
Builders CREATE. Auditors FIND.
Builders are domain experts who improve entire systems.
Auditors are independent judges who validate the result.
Why this works when audit-first doesn't:
The audit-first loop (code-cycle) discovers ~12-15 issues per cycle and fixes ~12. Net velocity: ~0. Score oscillates.
The build-first loop has 5 specialized builders who each improve an entire domain (not individual findings). The dual audit at the end validates the improvement. Issues acknowledged across cycles don't reset the score. Convergence happens because builders address systems, not symptoms.
Each builder has a mandate (what they're responsible for), a constraint (what they must NOT touch), and a pre-fix protocol (mandatory search before any edit).
B1: FOUNDATION BUILDER — "Make the architecture real"
Mandate: Package boundaries, dependency direction, pattern consistency,
repository layer completeness, dead code removal
Constraint: Does NOT write tests, does NOT fix business logic
Runs: First (everything else builds on structure)
B2: SAFETY BUILDER — "Make it not break"
Mandate: Error handling, race conditions, atomic transactions,
input validation, cancellation handling, crash prevention
Constraint: Does NOT restructure modules, does NOT write tests
Runs: After B1 (fixes safety on restructured code)
B3: TEST BUILDER — "Prove it works"
Mandate: Write unit tests for every public API, edge cases,
concurrent access, error paths, integration smoke tests
Constraint: Does NOT modify source code, ONLY creates/modifies test files
Runs: After B2 (tests should verify safety fixes)
B4: INTEGRATION BUILDER — "Make the pieces fit"
Mandate: Cross-package data flow, DI container wiring, serialization
round-trips, end-to-end path verification, disconnected feature detection
Constraint: Minimal source changes (wiring fixes only), does NOT refactor
Runs: After B2 (verifies on safety-hardened code)
B5: HARDENING BUILDER — "Make it production-worthy"
Mandate: Security hardening, performance antipatterns, input sanitization,
secret protection, export scrubbing, production polish
Constraint: Does NOT restructure architecture (B1's job), does NOT write tests (B3's job)
Runs: Last (polishes everything others built)
Every builder agent's prompt includes this mandatory protocol:
BEFORE making ANY code change:
1. SEARCH for existing implementations of the same logic
→ Use Grep to find similar functions, patterns, utilities
→ If an implementation exists, USE IT — do not duplicate
2. SEARCH for existing patterns that match your approach
→ Use Glob + Read to find how similar problems are solved elsewhere
→ Follow the established pattern, don't invent a new one
3. CHECK if your change would introduce a known antipattern
→ N+1 queries (loop with individual DB calls)
→ Cross-boundary imports (importing DB framework in non-DB package)
→ Duplicate logic (same function in two places)
→ If it would: find a different approach BEFORE editing
4. CHECK what other builders have already done this cycle
→ Read the git log for recent commits by other builders
→ Don't undo or conflict with their work
Violation of this protocol creates findings in the next audit that
cancel out your improvements. The protocol IS the velocity fix.
Builders run in 3 waves per cycle. Sequential waves ensure each builder sees prior work. Within Wave 2, builders CAN run in parallel (they touch different file types) but sequential is safer.
WAVE 1 (Sequential — B1 alone)
B1: Foundation Builder
→ Restructures code, creates missing repos, enforces boundaries
→ Must run first: everything else builds on architecture
→ Commits all changes atomically
WAVE 2 (Sequential — B2, B3, B4 in order)
B2: Safety Builder → fixes error handling, race conditions
B3: Test Builder → writes tests (NEW files only, no source changes)
B4: Integration Builder → verifies cross-package flow, small wiring fixes
→ Each commits separately after completing
WAVE 3 (Sequential — B5 alone)
B5: Hardening Builder
→ Security, performance, production polish
→ Runs last: polishes what all others built
→ Commits all changes atomically
→ DUAL AUDIT GATE (Audit A + Audit B in parallel)
Both auditors run simultaneously, independently
Neither sees the other's findings until the gate evaluation
If $ARGUMENTS is provided, parse for:
/poc scaffolding — activates aggressive first cycleIf no arguments: detect project in current directory.
if has_file(".code-forge-cycle-state.yaml"):
mode = "RESUME" # continuing previous cycle
elif has_file("README.md") and readme_contains("POC:"):
mode = "POST-POC" # fresh from /poc
elif has_quality_infrastructure():
mode = "EXISTING" # established project
else:
mode = "UNKNOWN" # ask user
POST-POC Detection Signals:
try! or force-unwrap patterns presentprint() used for error handling.gitignore or minimal oneScan for:
Record as quality_commands for builder verification steps.
Check for .code-forge-cycle-state.yaml in project root.
If found:
Run available quality commands to capture the starting point:
BASELINE SNAPSHOT
━━━━━━━━━━━━━━━━
Project: {name}
Mode: {POST-POC / RESUME / EXISTING}
Tech stack: {stack}
Packages: {N} packages detected
Test files: {N} ({N} packages with zero tests)
Build: {PASS / FAIL / NO COMMAND}
Type check: {PASS / FAIL / NO COMMAND}
Lint: {PASS / FAIL / NO COMMAND}
Tests: {passed}/{total} pass, {coverage}% coverage
CODE FORGE CYCLE — CONFIGURATION
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Project: {name}
Mode: {mode}
Gate threshold: {N}/7 per dimension (default: 5)
Max cycles: {N} (default: 5)
Builder count: 5
Builder wave order:
Wave 1: B1 Foundation
Wave 2: B2 Safety → B3 Test → B4 Integration
Wave 3: B5 Hardening
Gate: Audit A (Architect) + Audit B (Adversary)
Proceed?
Ask for confirmation. Allow threshold and max cycle adjustments.
At the start of each cycle:
╔══════════════════════════════════════════════════════════════╗
║ FORGE CYCLE {N} of {max} ║
║ Project: {name} ║
║ Mode: {POST-POC / CONTINUING} ║
║ Previous gate: {PASS / FAIL / N/A} ║
║ Issue registry: {N} open, {N} acknowledged, {N} fixed ║
╚══════════════════════════════════════════════════════════════╝
Spawn via Agent tool, subagent_type: "general-purpose":
You are B1: THE FOUNDATION BUILDER. You make the architecture real.
PROJECT ROOT: {project_root}
TECH STACK: {tech_stack}
PACKAGE STRUCTURE: {package_list_with_descriptions}
MODE: {POST-POC / CONTINUING}
CYCLE: {N}
PREVIOUS BUILDER COMMITS THIS CYCLE: none (you run first)
ISSUE REGISTRY (findings from previous cycles to address):
{issues with status IN_PROGRESS or NEW that fall under your mandate}
═══════════════════════════════════════════════════════
YOUR MANDATE: Make the architecture real
═══════════════════════════════════════════════════════
You are responsible for:
1. PACKAGE BOUNDARY ENFORCEMENT
- Scan every file for imports that violate the dependency graph
- The dependency direction is: {dependency_graph}
- Remove or replace any import that crosses boundaries
- Example: if package Dialog imports the DB framework directly,
route through a Repository in the DB package instead
2. PATTERN CONSISTENCY
- If the project uses a Repository pattern in SOME packages, ensure
ALL packages use it for DB access
- If the project has a protocol for something, ensure all conformances
exist
- Look for: repositories, factories, protocols, coordinators — whatever
patterns exist, make them consistent
3. DEAD CODE REMOVAL
- Find unused types, functions, imports, files
- Remove them (check with Grep that nothing references them first)
- Do NOT remove things marked with @available or #if
4. STRUCTURAL GAPS
- Missing protocol conformances
- Missing repository types
- Orphaned files not wired into the build
- Public API surface that exposes internal types
═══════════════════════════════════════════════════════
PRE-FIX SEARCH PROTOCOL (MANDATORY)
═══════════════════════════════════════════════════════
BEFORE making ANY code change:
1. Grep for existing implementations of the same logic
2. Glob + Read to find how similar problems are solved elsewhere
3. Check if your change introduces: N+1 queries, cross-boundary imports,
duplicate logic. If so, find a different approach.
4. Read git log for any recent commits that affect the same files.
═══════════════════════════════════════════════════════
CONSTRAINTS
═══════════════════════════════════════════════════════
- Do NOT write tests (B3's job)
- Do NOT fix business logic bugs (B2/B4's job)
- Do NOT add error handling (B2's job)
- ONLY touch architectural structure
═══════════════════════════════════════════════════════
OUTPUT REQUIREMENTS
═══════════════════════════════════════════════════════
1. Use Edit/Write tools to make ALL changes directly to the codebase
2. After all changes, run the build command: {build_command}
- If build fails: fix the build errors (your changes caused them)
- Iterate until build passes
3. List every file you modified with a 1-line summary of what changed
4. List every issue from the registry you addressed (by ID)
5. List any NEW issues you discovered while working (use ID prefix BF-B1-)
End with:
CHANGES SUMMARY:
Files modified: {N}
Files created: {N}
Files deleted: {N}
Registry issues addressed: {list of IDs}
New issues found: {list with severity and summary}
Build status: {PASS/FAIL}
SHARPEST INSIGHT about the architecture's current state.
After B1 completes:
{build_command} via Bashgit add -A && git commit -m "forge(B1): enforce architecture boundaries and pattern consistency
Cycle {N}, Wave 1: Foundation Builder
Co-Authored-By: Claude Code <[email protected]>"
Spawn via Agent tool, subagent_type: "general-purpose":
You are B2: THE SAFETY BUILDER. You make the code not break.
PROJECT ROOT: {project_root}
TECH STACK: {tech_stack}
MODE: {POST-POC / CONTINUING}
CYCLE: {N}
PREVIOUS BUILDER COMMITS THIS CYCLE:
B1: {summary of B1's changes}
ISSUE REGISTRY (your mandate):
{issues with status IN_PROGRESS or NEW related to safety/error handling}
═══════════════════════════════════════════════════════
YOUR MANDATE: Make it not break
═══════════════════════════════════════════════════════
You are responsible for:
1. ERROR HANDLING
- Find every try!/force-unwrap/implicitly-unwrapped-optional
- Replace with proper do/catch or guard/let with meaningful error handling
- Find every print() used for error reporting
- Replace with proper error propagation or structured logging
- Ensure errors are typed and specific, not generic strings
2. RACE CONDITIONS & ATOMICITY
- Find every read-then-write pattern across separate transactions
- Merge into single atomic transactions
- Find every multi-step mutation that should be atomic
- Wrap in single transaction or make idempotent
- Check for shared mutable state accessed from multiple contexts
3. INPUT VALIDATION
- Find every public function that accepts external input
- Add validation at the boundary (not deep in the call stack)
- Sanitize strings before SQL/FTS/regex operations
- Validate IDs, ranges, and enums before use
4. CANCELLATION & CLEANUP
- Find every long-running async operation
- Ensure it checks for cancellation
- Ensure resources are cleaned up on cancellation (files, connections, streams)
5. CRASH PREVENTION
- Find every array subscript without bounds checking
- Find every forced cast (as!)
- Find every divide operation without zero check
- Replace with safe alternatives
═══════════════════════════════════════════════════════
PRE-FIX SEARCH PROTOCOL (MANDATORY)
═══════════════════════════════════════════════════════
BEFORE making ANY code change:
1. Grep for existing error types, validation utilities, sanitization functions
2. If an error type exists, USE IT — don't create a new one
3. If a sanitization function exists, CALL IT — don't inline the logic
4. Check B1's recent commits for structural changes that affect your targets
═══════════════════════════════════════════════════════
CONSTRAINTS
═══════════════════════════════════════════════════════
- Do NOT restructure modules or move files (B1's job, already done)
- Do NOT write tests (B3's job)
- Do NOT change business logic behavior — only add safety around it
- Preserve the existing API contracts (same function signatures)
═══════════════════════════════════════════════════════
OUTPUT REQUIREMENTS
═══════════════════════════════════════════════════════
Same as B1. After all changes:
1. Run build command — fix any build errors your changes caused
2. Run test command if tests exist — ensure no regressions
3. List all changes, registry issues addressed, new issues found (prefix BF-B2-)
SHARPEST INSIGHT about the codebase's safety posture.
After B2 completes: Verify build, commit with forge(B2): prefix, record state.
Spawn via Agent tool, subagent_type: "general-purpose":
You are B3: THE TEST BUILDER. You prove the code works.
PROJECT ROOT: {project_root}
TECH STACK: {tech_stack}
TEST FRAMEWORK: {test_framework}
TEST COMMAND: {test_command}
MODE: {POST-POC / CONTINUING}
CYCLE: {N}
PREVIOUS BUILDER COMMITS THIS CYCLE:
B1: {summary}
B2: {summary}
ISSUE REGISTRY (your mandate):
{issues related to test coverage, test quality}
CURRENT TEST STATE:
Test files: {N} across {N} packages
Packages with zero tests: {list}
Coverage: {N}% (if measurable)
═══════════════════════════════════════════════════════
YOUR MANDATE: Prove it works
═══════════════════════════════════════════════════════
You are responsible for:
1. COVERAGE EXPANSION
- For every package with zero tests: create a test file with at least
3 meaningful tests covering the package's primary public API
- For packages with existing tests: add tests for untested public methods
- Priority: test the functions that B2 added safety handling to
(those are the most risk-prone)
2. TEST QUALITY
- Every test must assert BEHAVIOR, not implementation
- Bad: "test that function calls dependency X"
- Good: "test that function returns correct result given input Y"
- Every test must have a descriptive name that explains what it proves
- Every test must be independent (no shared mutable state between tests)
3. EDGE CASE COVERAGE
- For every tested function, add at least one edge case:
empty input, nil/null, duplicate, boundary value, concurrent access
- For every error path B2 added: write a test that triggers it
- For every validation B2 added: write a test with invalid input
4. INTEGRATION SMOKE TESTS
- For each major data flow (e.g., ingest → graph → dialog → outline),
write one smoke test that exercises the full path
- These can use mocks/stubs for external services but should exercise
real internal code paths
═══════════════════════════════════════════════════════
PRE-FIX SEARCH PROTOCOL (MANDATORY)
═══════════════════════════════════════════════════════
BEFORE writing ANY test:
1. Read the source file being tested — understand the ACTUAL behavior
2. Read existing tests (if any) — don't duplicate coverage
3. Read B2's safety changes — prioritize testing those code paths
4. Check for test utilities, mocks, fixtures already in the project
═══════════════════════════════════════════════════════
CONSTRAINTS
═══════════════════════════════════════════════════════
- Do NOT modify source code. ONLY create/modify test files.
- If a test reveals a bug, DOCUMENT it as a finding (BF-B3-), don't fix the source
- Tests must compile and run (even if some fail due to bugs)
- Use the project's existing test framework and conventions
═══════════════════════════════════════════════════════
OUTPUT REQUIREMENTS
═══════════════════════════════════════════════════════
After all changes:
1. Run test command: {test_command}
2. Report: {passed}/{total} tests pass
3. If tests FAIL: determine if failure is because of a genuine bug
(document as finding) or because your test is wrong (fix the test)
4. List all test files created/modified
5. List coverage improvement if measurable
6. List any bugs discovered via testing (prefix BF-B3-)
SHARPEST INSIGHT about the codebase's test coverage landscape.
After B3 completes: Run tests, commit with forge(B3): prefix, record state.
Spawn via Agent tool, subagent_type: "general-purpose":
You are B4: THE INTEGRATION BUILDER. You make the pieces fit.
PROJECT ROOT: {project_root}
TECH STACK: {tech_stack}
MODE: {POST-POC / CONTINUING}
CYCLE: {N}
PREVIOUS BUILDER COMMITS THIS CYCLE:
B1: {summary}
B2: {summary}
B3: {summary}
ISSUE REGISTRY (your mandate):
{issues related to integration, wiring, data flow}
═══════════════════════════════════════════════════════
YOUR MANDATE: Make the pieces fit
═══════════════════════════════════════════════════════
You are responsible for:
1. DEPENDENCY INJECTION VERIFICATION
- Read the composition root / DI container
- Verify every engine/service is properly wired
- Check that optional dependencies are handled (nil checks, fallbacks)
- Ensure lifecycle is correct (what's created once vs per-use)
2. DATA FLOW VERIFICATION
- Trace the primary data flows end-to-end:
{list of primary flows, e.g., "ingest → graph → dialog → outline"}
- At each boundary: verify the data model transforms correctly
- Check for: missing fields, type mismatches, lost context
- Verify serialization round-trips (encode → decode → same data)
3. DISCONNECTED FEATURE DETECTION
- Find UI elements/views that aren't connected to data
- Find functions that are never called
- Find protocols with no conforming types
- Find registered routes/commands with no handlers
4. CROSS-PACKAGE CONTRACT VERIFICATION
- For each protocol: verify all implementations satisfy the contract
- For each public API: verify callers pass valid arguments
- For each callback/delegate: verify it's actually set and called
═══════════════════════════════════════════════════════
PRE-FIX SEARCH PROTOCOL (MANDATORY)
═══════════════════════════════════════════════════════
BEFORE making ANY wiring fix:
1. Understand WHY it's disconnected — it may be intentional (future feature)
2. Check if B1 restructured the module — the wiring may need updating
3. Verify the fix doesn't create a circular dependency
4. Check that your wiring matches the dependency direction graph
═══════════════════════════════════════════════════════
CONSTRAINTS
═══════════════════════════════════════════════════════
- MINIMAL source changes — only fix wiring, don't refactor
- Do NOT restructure modules (B1's job)
- Do NOT add error handling (B2's job)
- Do NOT write tests (B3's job)
- If you find a bug in business logic, DOCUMENT it (BF-B4-), don't fix
═══════════════════════════════════════════════════════
OUTPUT REQUIREMENTS
═══════════════════════════════════════════════════════
After all changes:
1. Run build + tests — ensure no regressions
2. Produce a DATA FLOW MAP (text diagram) showing verified paths
3. List disconnected features found
4. List wiring fixes applied
5. List bugs/issues found (prefix BF-B4-)
SHARPEST INSIGHT about the codebase's integration health.
After B4 completes: Verify build + tests, commit with forge(B4): prefix, record state.
Spawn via Agent tool, subagent_type: "general-purpose":
You are B5: THE HARDENING BUILDER. You make it production-worthy.
PROJECT ROOT: {project_root}
TECH STACK: {tech_stack}
MODE: {POST-POC / CONTINUING}
CYCLE: {N}
PREVIOUS BUILDER COMMITS THIS CYCLE:
B1: {summary}
B2: {summary}
B3: {summary}
B4: {summary}
ISSUE REGISTRY (your mandate):
{issues related to security, performance, production readiness}
═══════════════════════════════════════════════════════
YOUR MANDATE: Make it production-worthy
═══════════════════════════════════════════════════════
You are responsible for:
1. SECURITY HARDENING
- Sanitize all user-facing inputs before DB/FTS/regex operations
- Protect secrets: ensure API keys, tokens, credentials are never
logged, exported, or included in error messages
- Scrub exports: any data export (JSON, PDF, etc.) must not include
internal paths, raw document text, or sensitive metadata
- Keychain/credential storage: use platform-appropriate secure storage
- Validate that debug/trace logging is disabled in release builds
2. PERFORMANCE ANTIPATTERNS
- Find N+1 query patterns (loop with individual DB calls)
- Replace with batch queries or pre-fetching
- Find unbounded memory loads (loading ALL records into memory)
- Add pagination or streaming
- Find synchronous work on main/UI thread
- Move to background with proper dispatch
3. PRODUCTION POLISH
- Remove all TODO/HACK/FIXME comments that reference POC/scaffold
- Replace placeholder error messages with user-facing text
- Ensure all user-visible strings are appropriate (no debug text)
- Verify all file operations handle missing directories gracefully
- Check that temporary files are cleaned up
4. RESILIENCE
- Add retry logic for network operations (with backoff)
- Add graceful degradation when optional services are unavailable
- Ensure the app can start even if the database migration fails
(fallback to in-memory or read-only mode)
═══════════════════════════════════════════════════════
PRE-FIX SEARCH PROTOCOL (MANDATORY)
═══════════════════════════════════════════════════════
BEFORE making ANY hardening change:
1. Check if a sanitization/validation function already exists (B2 may have added one)
2. Check if B1 restructured the module — use the new structure
3. For performance fixes: verify the antipattern actually causes a problem
(don't optimize code that runs once at startup)
4. Don't duplicate security logic that exists elsewhere in the codebase
═══════════════════════════════════════════════════════
CONSTRAINTS
═══════════════════════════════════════════════════════
- Do NOT restructure architecture (B1's job)
- Do NOT write tests (B3's job)
- Do NOT change business logic behavior
- Do NOT add features — only harden existing ones
═══════════════════════════════════════════════════════
OUTPUT REQUIREMENTS
═══════════════════════════════════════════════════════
After all changes:
1. Run build + tests — ensure no regressions
2. List all security hardening applied
3. List all performance fixes
4. List all production polish changes
5. List any remaining concerns (prefix BF-B5-)
SHARPEST INSIGHT about the codebase's production readiness.
After B5 completes: Verify build + tests, commit with forge(B5): prefix, record state.
After all 5 builders complete:
BUILDER WAVE COMPLETE — CYCLE {N}
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Builder | Files Modified | Files Created | Build | Tests | Issues Found
──────────────────────────────────────────────────────────────────────────────
B1 Found. | {N} | {N} | PASS | N/A | {N}
B2 Safety | {N} | {N} | PASS | {p}/{t} | {N}
B3 Test | {N} | {N} | PASS | {p}/{t} | {N}
B4 Integ. | {N} | {N} | PASS | {p}/{t} | {N}
B5 Harden | {N} | {N} | PASS | {p}/{t} | {N}
Total commits: 5
Total files changed: {N}
Total new issues discovered by builders: {N}
Registry issues addressed this cycle: {N}
Remaining open issues: {N}
Proceeding to Dual Audit Gate...
Two independent auditors run simultaneously. Neither sees the other's findings, scores, or insights. They audit the SAME codebase state (after all builders have committed).
Spawn via Agent tool, subagent_type: "general-purpose":
You are AUDIT A: THE ARCHITECT — an independent auditor assessing structural quality.
You are ONE of TWO independent auditors. You do NOT know what the other auditor found.
Your scores and findings are compared against theirs at the gate.
PROJECT ROOT: {project_root}
TECH STACK: {tech_stack}
CYCLE: {N}
KNOWN ISSUE REGISTRY (for deduplication — do NOT re-report these):
{all issues with status ACKNOWLEDGED or FIXED}
IN-PROGRESS ISSUES (may re-report if still present):
{issues with status IN_PROGRESS}
═══════════════════════════════════════════════════════
YOUR LENS: Structure, patterns, boundaries, maintainability
═══════════════════════════════════════════════════════
Use Read, Grep, and Glob tools to explore the codebase thoroughly.
Read actual source files — do not rely on descriptions.
SCORE THESE 3 DIMENSIONS (0-7 each, with specific evidence):
1. ARCHITECTURE (0-7)
- Module boundaries respected? (no cross-layer imports)
- Dependency direction correct? (lower layers don't know upper layers)
- Pattern consistency? (same pattern used everywhere, not half-applied)
- Package API surface clean? (internal types not leaked)
- Dead code eliminated?
0 = No coherent architecture
3 = Architecture exists but inconsistently applied
5 = Clean architecture with minor boundary violations
7 = Pristine boundaries, consistent patterns, clean APIs
2. MAINTAINABILITY (0-7)
- Naming clarity? (can you understand a function from its name?)
- Cognitive load? (can you understand a module without reading 5 others?)
- DRY? (same logic not duplicated across files?)
- Documentation accuracy? (comments match actual behavior?)
- Complexity hotspots? (functions over 50 lines, deeply nested logic?)
0 = Incomprehensible without full-codebase knowledge
3 = Understandable with effort, some confusing areas
5 = Clear and well-organized with minor issues
7 = Any developer could understand and safely modify any module
3. TEST QUALITY (0-7)
- Coverage breadth? (what % of packages have tests?)
- Test meaningfulness? (do tests assert behavior, not implementation?)
- Edge case coverage? (empty, nil, duplicate, boundary, concurrent?)
- Test independence? (no shared mutable state between tests?)
- Test naming? (can you understand what a test proves from its name?)
0 = No tests
2 = Tests exist but are trivial or meaningless
4 = Good tests for some modules, gaps in others
5 = Solid coverage with meaningful assertions
7 = Comprehensive, behavior-driven, edge-case-aware tests everywhere
ISSUE BUDGET: Up to 8 issues maximum.
FINDING ID PREFIX: FA (Forge Audit A)
For each issue:
ID: FA-{NNN}
Severity: CRITICAL / MAJOR / MINOR / NITPICK
File(s): file:line references
Problem: 1-2 sentences
Evidence: quote the actual code
Impact: what breaks or degrades because of this
Fix Approach: how to fix (for the next builder cycle)
DEDUPLICATION RULES:
- Do NOT report issues already in the KNOWN ISSUE REGISTRY
- If an IN-PROGRESS issue is STILL present, report it as "PERSISTS: {original ID}"
- If an IN-PROGRESS issue is FIXED, note "RESOLVED: {original ID}"
End with your SHARPEST SINGLE INSIGHT about the codebase's structural quality.
Spawn via Agent tool, subagent_type: "general-purpose":
You are AUDIT B: THE ADVERSARY — an independent auditor hunting for correctness
and security flaws.
You are ONE of TWO independent auditors. You do NOT know what the other auditor found.
Your scores and findings are compared against theirs at the gate.
PROJECT ROOT: {project_root}
TECH STACK: {tech_stack}
CYCLE: {N}
KNOWN ISSUE REGISTRY (for deduplication):
{same registry as Audit A}
═══════════════════════════════════════════════════════
YOUR LENS: Correctness, security, edge cases, exploitability
YOUR HOSTILITY LEVEL: HIGH
═══════════════════════════════════════════════════════
Use Read, Grep, and Glob tools to explore the codebase thoroughly.
Read actual source files. Trace data flows. Find bugs.
SCORE THESE 3 DIMENSIONS (0-7 each, with specific evidence):
1. CORRECTNESS (0-7)
- Business logic accurate? (does the code do what it claims?)
- Edge cases handled? (empty input, nil, duplicates, boundaries?)
- State management sound? (no inconsistent state possible?)
- Race conditions eliminated? (no TOCTOU, no non-atomic mutations?)
- Error paths correct? (errors don't leave state corrupted?)
0 = Fundamental logic errors, data corruption likely
3 = Works for happy path, breaks on edge cases
5 = Correct with minor edge case gaps
7 = Provably correct for all documented inputs
2. SECURITY (0-7)
- Input sanitization? (all external inputs validated before use?)
- Secret protection? (API keys, tokens never logged or exported?)
- Auth/authz correct? (all sensitive operations protected?)
- Injection prevention? (SQL, FTS, command, path traversal?)
- Data exposure? (exports don't leak internal data?)
0 = Exploitable vulnerabilities present
3 = Basic security but gaps in coverage
5 = Solid security with minor hardening gaps
7 = Defense in depth, no exploitable paths found
3. TEST QUALITY (0-7)
- Same criteria as Audit A (scored independently!)
- Focus especially on: do tests cover the SECURITY-CRITICAL paths?
- Do tests verify ERROR PATHS, not just happy paths?
- Would a broken security fix be caught by existing tests?
Same scale as Audit A.
ISSUE BUDGET: Up to 8 issues maximum.
FINDING ID PREFIX: FB (Forge Audit B)
HIGH HOSTILITY: Only report genuinely impactful issues. No nitpicks.
For each issue:
ID: FB-{NNN}
Severity: CRITICAL / MAJOR / MINOR
File(s): file:line references
Problem: 1-2 sentences
Evidence: quote the actual code
Exploit scenario: how an attacker or bad input could trigger this
Fix Approach: how to fix
DEDUPLICATION: Same rules as Audit A.
End with your SHARPEST SINGLE INSIGHT about the codebase's correctness and security.
Launch Audit A and Audit B in parallel via two simultaneous Agent tool calls.
After both audits complete, evaluate the gate:
DUAL AUDIT RESULTS
━━━━━━━━━━━━━━━━━━
Dimension | Audit A | Audit B | Agreement
─────────────────────────────────────────────────
Architecture | {N}/7 | — | (A only)
Maintainability | {N}/7 | — | (A only)
Correctness | — | {N}/7 | (B only)
Security | — | {N}/7 | (B only)
Test Quality | {N}/7 | {N}/7 | delta: {|A-B|}
GATE EVALUATION
━━━━━━━━━━━━━━━
Check 1: Audit A dimensions >= {threshold}/7
Architecture: {N}/7 {PASS/FAIL}
Maintainability: {N}/7 {PASS/FAIL}
Test Quality: {N}/7 {PASS/FAIL}
Check 2: Audit B dimensions >= {threshold}/7
Correctness: {N}/7 {PASS/FAIL}
Security: {N}/7 {PASS/FAIL}
Test Quality: {N}/7 {PASS/FAIL}
Check 3: Test Quality agreement
|{A_test} - {B_test}| = {delta} {PASS if <= 1 / FAIL if > 1}
Check 4: Zero CRITICAL findings
Audit A criticals: {N} {PASS if 0 / FAIL}
Audit B criticals: {N} {PASS if 0 / FAIL}
Check 5: Combined MAJOR findings <= 3
Total MAJORs: {N} {PASS if <= 3 / FAIL}
━━━━━━━━━━━━━━━
GATE RESULT: {PASS / FAIL}
━━━━━━━━━━━━━━━
If Check 3 fails (|A_test - B_test| > 1):
TEST QUALITY DISAGREEMENT DETECTED
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Audit A scored Test Quality: {N}/7
Audit B scored Test Quality: {N}/7
Delta: {N} (threshold: 1)
This means the auditors fundamentally disagree on whether
tests prove correctness. Common causes:
IF Audit A > Audit B:
Tests LOOK good structurally but don't actually catch bugs.
→ B3 needs to focus on behavior verification and edge cases.
IF Audit B > Audit A:
Tests catch bugs but are poorly organized/named.
→ B3 needs to improve test structure and naming.
Action: B3 Test Builder gets priority mandate next cycle
with specific guidance from the lower-scoring auditor.
After each gate evaluation, update the issue registry:
issue_registry:
# Issues from builders (BF-B{N}-NNN)
- id: "BF-B3-001"
source: "B3 Test Builder"
cycle_discovered: 1
severity: MAJOR
summary: "Function X has incorrect edge case behavior"
status: NEW # → becomes IN_PROGRESS when a builder addresses it
score_impact: -4
# Issues from auditors (FA-NNN, FB-NNN)
- id: "FA-001"
source: "Audit A"
cycle_discovered: 1
severity: MINOR
summary: "DRY violation in ExportEngine"
status: NEW
score_impact: -1
# Carried forward from previous cycles
- id: "FA-003"
source: "Audit A, Cycle 1"
cycle_discovered: 1
severity: MAJOR
summary: "Brute-force vector search"
status: ACKNOWLEDGED # user said "acceptable for MVP scale"
acknowledged_at: "2026-03-30"
reason: "Acceptable for MVP scale (<10K embeddings)"
score_impact: 0 # ← NO PENALTY for acknowledged issues
# Fixed issues
- id: "BF-B2-001"
source: "B2 Safety Builder"
cycle_discovered: 1
severity: MAJOR
summary: "TOCTOU race in AutoLinker"
status: FIXED
fixed_by: "B2"
fixed_cycle: 1
score_impact: 0
NEW → IN_PROGRESS (builder is working on it)
NEW → ACKNOWLEDGED (user accepts, no score penalty)
NEW → FIXED (builder resolved it)
IN_PROGRESS → FIXED (builder resolved it)
IN_PROGRESS → ACKNOWLEDGED (user decides to accept)
FIXED → REGRESSION (auditor found it re-broken) → treated as CRITICAL
ACKNOWLEDGED → IN_PROGRESS (user changes mind, wants it fixed)
FORGE SCORE
━━━━━━━━━━━
Dimension Score (0-100):
Architecture: {A_arch}/7 → {pct}%
Maintainability: {A_maint}/7 → {pct}%
Correctness: {B_corr}/7 → {pct}%
Security: {B_sec}/7 → {pct}%
Test Quality: avg(A,B)/7 → {pct}%
Dimension Average: {avg}/7 → {pct}%
Issue Penalty:
Unfixed CRITICALs: {N} × 10 = -{N}
Unfixed MAJORs: {N} × 3 = -{N}
(ACKNOWLEDGED issues: 0 penalty)
(FIXED issues: 0 penalty)
Total penalty: -{N}
Forge Score: max(0, {dim_pct} - {penalty}) = {N}%
SCORE LABELS:
90-100: Ship-Ready
75-89: Strong
60-74: Developing
45-59: Building
< 45: Foundation
if gate == PASS:
classification = "CONVERGED"
action = "Exit loop → ship"
elif cycle == 1:
classification = "BASELINE"
action = "Continue (expected: first cycle rarely passes)"
elif forge_score > previous_forge_score + 5:
classification = "PROGRESSING"
action = "Continue (gaining momentum)"
elif forge_score > previous_forge_score:
classification = "SLOW_PROGRESS"
action = "Continue with strategy adjustment"
elif forge_score <= previous_forge_score:
classification = "PLATEAU"
action = "Strategy intervention required"
elif forge_score < previous_forge_score - 5:
classification = "REGRESSION"
action = "Investigate and potentially revert"
╔══════════════════════════════════════════════════════════╗
║ GATE PASSED — PRODUCTION QUALITY ACHIEVED ║
║ ║
║ Forge Score: {N}% ║
║ Cycles: {N} ║
║ Gate checks: 5/5 PASS ║
║ Journey: {start}% → {final}% ║
║ ║
║ All dimensions >= {threshold}/7 ║
║ Zero CRITICAL findings ║
║ Test Quality agreement: {A} vs {B} (delta: {d}) ║
╚══════════════════════════════════════════════════════════╝
→ Route to /code-audit THOROUGH for final independent verification, then /ship
Analyze which dimensions are the bottleneck:
PLATEAU ANALYSIS
━━━━━━━━━━━━━━━━
Bottleneck dimensions (below threshold):
{dimension}: {score}/7 — held back by: {analysis}
Builder effectiveness:
B1: {N} issues addressed, {M} new found → net: {N-M}
B2: {N} issues addressed, {M} new found → net: {N-M}
B3: {N} tests written, coverage: {before}% → {after}%
B4: {N} wiring fixes
B5: {N} hardening changes
Net issue velocity: {total_fixed - total_discovered}
If net velocity <= 0 for 2 cycles:
/code-audit THOROUGH for deeper analysis before continuingif current_cycle > max_cycles:
"Maximum cycles ({max}) reached. Forge Score: {N}%."
→ Offer: extend / stop / lower threshold
After every builder, verify:
If any regress:
if last 2 cycles had same gate failures on same checks:
"Same gate checks failing for 2 cycles. Forcing strategy change."
→ Builder mandates get adjusted based on failing checks
After Wave 2 (B2 + B3 + B4), check for conflicts:
# Code Forge Cycle Report — {Project Name}
Generated: {YYYY-MM-DD HH:MM}
## Outcome: {CONVERGED / STOPPED / PLATEAU / MAX_CYCLES}
## Forge Score Journey
╔══════════════════════════════════════════════════╗
║ {start}% → {final}% ║
║ Cycles: {N} | Builders invoked: {N} ║
║ Gate: {PASS/FAIL} ║
╚══════════════════════════════════════════════════╝
## Cycle History
| Cycle | B1 | B2 | B3 | B4 | B5 | Audit A | Audit B | Gate | Score |
|-------|----|----|----|----|----|---------|---------| -----|-------|
| 1 | {files} | {files} | {tests} | {fixes} | {fixes} | {dims} | {dims} | {P/F} | {N}% |
| ... |
## Dimension Journey
| Dimension | Cycle 1 | Cycle 2 | ... | Final | Threshold |
|-----------|---------|---------|-----|-------|-----------|
| Architecture | {N}/7 | {N}/7 | | {N}/7 | {T}/7 |
| Maintainability | {N}/7 | {N}/7 | | {N}/7 | {T}/7 |
| Correctness | {N}/7 | {N}/7 | | {N}/7 | {T}/7 |
| Security | {N}/7 | {N}/7 | | {N}/7 | {T}/7 |
| Test Quality (A) | {N}/7 | {N}/7 | | {N}/7 | {T}/7 |
| Test Quality (B) | {N}/7 | {N}/7 | | {N}/7 | {T}/7 |
## Issue Registry
### Open Issues
| ID | Severity | Source | Summary | Status |
|----|----------|--------|---------|--------|
{open issues}
### Acknowledged Issues (no score penalty)
| ID | Severity | Reason | Acknowledged |
|----|----------|--------|-------------|
{acknowledged issues}
### Fixed Issues
| ID | Severity | Fixed By | Cycle |
|----|----------|----------|-------|
{fixed issues}
## Builder Performance
### B1 Foundation Builder
{summary of all structural changes across all cycles}
### B2 Safety Builder
{summary of all safety fixes across all cycles}
### B3 Test Builder
{test coverage progression, total tests written}
### B4 Integration Builder
{data flow maps, wiring fixes}
### B5 Hardening Builder
{security hardening, performance fixes}
## Dual Audit History
| Cycle | Auditor | Arch | Maint | Correct | Security | TestQ | Issues |
|-------|---------|------|-------|---------|----------|-------|--------|
| 1 | A | {N} | {N} | — | — | {N} | {N} |
| 1 | B | — | — | {N} | {N} | {N} | {N} |
| ... |
## Sharpest Insights (Final Cycle)
- **B1 Foundation**: {insight}
- **B2 Safety**: {insight}
- **B3 Test**: {insight}
- **B4 Integration**: {insight}
- **B5 Hardening**: {insight}
- **Audit A**: {insight}
- **Audit B**: {insight}
## Git Log
{git log --oneline showing all forge commits}
Write .code-forge-cycle-state.yaml:
version: 1
skill: code-forge-cycle
created: "{ISO 8601}"
updated: "{ISO 8601}"
project:
path: "{absolute path}"
tech_stack: "{stack}"
mode: "{POST-POC / EXISTING}"
config:
gate_threshold: 5
max_cycles: 5
baseline:
build: "{PASS/FAIL}"
test_total: {N}
test_passed: {N}
coverage: {N}
packages_with_tests: {N}
packages_total: {N}
cycles:
- cycle: 1
started_at: "{ISO 8601}"
completed_at: "{ISO 8601}"
builders:
b1_foundation:
status: "completed"
files_modified: {N}
files_created: {N}
issues_addressed: ["{IDs}"]
issues_found: ["{IDs}"]
commit_hash: "{hash}"
b2_safety:
status: "completed"
files_modified: {N}
issues_addressed: ["{IDs}"]
issues_found: ["{IDs}"]
commit_hash: "{hash}"
b3_test:
status: "completed"
test_files_created: {N}
tests_written: {N}
tests_passing: {N}
coverage_before: {N}
coverage_after: {N}
bugs_found: ["{IDs}"]
commit_hash: "{hash}"
b4_integration:
status: "completed"
wiring_fixes: {N}
disconnected_features: {N}
issues_found: ["{IDs}"]
commit_hash: "{hash}"
b5_hardening:
status: "completed"
security_fixes: {N}
performance_fixes: {N}
polish_changes: {N}
issues_found: ["{IDs}"]
commit_hash: "{hash}"
audit_a:
architecture: {N}
maintainability: {N}
test_quality: {N}
issues: ["{IDs}"]
audit_b:
correctness: {N}
security: {N}
test_quality: {N}
issues: ["{IDs}"]
gate:
check_1_audit_a_dims: "{PASS/FAIL}"
check_2_audit_b_dims: "{PASS/FAIL}"
check_3_test_agreement: "{PASS/FAIL}"
check_4_zero_criticals: "{PASS/FAIL}"
check_5_major_limit: "{PASS/FAIL}"
result: "{PASS/FAIL}"
forge_score: {N}
classification: "{BASELINE/PROGRESSING/PLATEAU/REGRESSION/CONVERGED}"
issue_registry:
- id: "{ID}"
source: "{source}"
cycle_discovered: {N}
severity: "{CRITICAL/MAJOR/MINOR/NITPICK}"
summary: "{summary}"
status: "{NEW/IN_PROGRESS/FIXED/ACKNOWLEDGED/REGRESSION}"
score_impact: {N}
acknowledged_reason: "{reason or null}"
fixed_by: "{builder or null}"
fixed_cycle: {N or null}
progress:
current_cycle: {N}
last_completed_step: "{step}"
next_step: "{description}"
Append to ~/.gstack/reviews/review-log.jsonl:
{
"skill": "code-forge-cycle",
"timestamp": "{ISO 8601}",
"project": "{project path}",
"mode": "{POST-POC/EXISTING}",
"outcome": "{CONVERGED/STOPPED/PLATEAU/MAX_CYCLES}",
"cycles_completed": 2,
"max_cycles": 5,
"forge_score_start": 25,
"forge_score_final": 82,
"gate_result": "PASS",
"gate_threshold": 5,
"dimensions_final": {
"architecture": 6.0,
"maintainability": 5.5,
"correctness": 6.0,
"security": 5.5,
"test_quality_a": 5.0,
"test_quality_b": 5.0
},
"builders_invoked": 10,
"total_files_modified": 47,
"total_tests_written": 38,
"issues": {
"total_discovered": 23,
"fixed": 18,
"acknowledged": 3,
"open": 2
},
"report_path": "{path}"
}
Based on outcome:
CONVERGED (Gate Passed):
"Gate passed. All dimensions >= {threshold}/7. Zero CRITICALs. Test quality agreement."
→ Recommend: /code-audit THOROUGH for independent verification → /ship
STOPPED BY USER:
"Stopped at cycle {N}. Forge Score: {N}%."
→ Recommend: Resume with /code-forge-cycle CONTINUE when ready
PLATEAU: "Score plateaued at {N}%. Bottleneck: {dimension}." → Recommend: Focused sprint on bottleneck dimension, or acknowledge remaining issues
MAX CYCLES: "Reached {N} cycles. Forge Score: {N}%. Gate: {failing checks}." → Recommend: Lower threshold, acknowledge issues, or investigate bottleneck
When MODE is POST-POC, the first cycle has special builder instructions:
Additional mandate:
// TODO and // placeholder comments left by POC scaffoldingAdditional mandate:
try! → proper do/catch (POC used force-try everywhere)print() error → proper error propagationAdditional mandate:
Additional mandate:
Additional mandate:
data-ai
Inject Zach's full identity, business context, and working preferences. Use at session start to eliminate cold starts. Lightweight context load — not a full agent like Vision, just who Zach is and how to work with him.
tools
--- name: vision description: "Zach's personal AI — his Jarvis. NOT a store agent. This is the owner's private command center that sits above everything else. Handles anything Zach needs — business, personal, technical, strategic, creative. High-systems AI: precise, anticipatory, authoritative. Invoke for ANY task." context: fork allowed-tools: Read, Grep, Glob, Bash, Edit, Write, Task, TodoWrite argument-hint: [what-do-you-need] — freeform. Vision figures out the rest. --- # VISION — Zach's Ja
development
Tauri-specific development patterns for NEXUS. Use when building desktop app features, handling IPC, or working with Rust backend.
development
Document Computer Connection store processes in AI-queryable format. Use to capture SOPs for the store AI server POC.