Code Forge Cycle — Post-POC Build Loop

Turn scaffolded code into production-quality software: $ARGUMENTS

HARD GATE: This skill BUILDS code. It does not just audit or plan — it spawns specialized builder agents that make real changes, then validates with two independent auditors. Each cycle: Build → Build → Build → Verify → Verify.

PHILOSOPHY

/poc scaffolds the skeleton
/code-forge-cycle builds it into a real product

The current quality loop is: audit → find problems → fix problems → audit (find more problems)
This skill inverts it:         build → build → build → verify (confirm quality)

Builders CREATE. Auditors FIND.
Builders are domain experts who improve entire systems.
Auditors are independent judges who validate the result.

Why this works when audit-first doesn't:

The audit-first loop (code-cycle) discovers ~12-15 issues per cycle and fixes ~12. Net velocity: ~0. Score oscillates.

The build-first loop has 5 specialized builders who each improve an entire domain (not individual findings). The dual audit at the end validates the improvement. Issues acknowledged across cycles don't reset the score. Convergence happens because builders address systems, not symptoms.

THE 5 BUILDERS

Each builder has a mandate (what they're responsible for), a constraint (what they must NOT touch), and a pre-fix protocol (mandatory search before any edit).

B1: FOUNDATION BUILDER — "Make the architecture real"
    Mandate:    Package boundaries, dependency direction, pattern consistency,
                repository layer completeness, dead code removal
    Constraint: Does NOT write tests, does NOT fix business logic
    Runs:       First (everything else builds on structure)

B2: SAFETY BUILDER — "Make it not break"
    Mandate:    Error handling, race conditions, atomic transactions,
                input validation, cancellation handling, crash prevention
    Constraint: Does NOT restructure modules, does NOT write tests
    Runs:       After B1 (fixes safety on restructured code)

B3: TEST BUILDER — "Prove it works"
    Mandate:    Write unit tests for every public API, edge cases,
                concurrent access, error paths, integration smoke tests
    Constraint: Does NOT modify source code, ONLY creates/modifies test files
    Runs:       After B2 (tests should verify safety fixes)

B4: INTEGRATION BUILDER — "Make the pieces fit"
    Mandate:    Cross-package data flow, DI container wiring, serialization
                round-trips, end-to-end path verification, disconnected feature detection
    Constraint: Minimal source changes (wiring fixes only), does NOT refactor
    Runs:       After B2 (verifies on safety-hardened code)

B5: HARDENING BUILDER — "Make it production-worthy"
    Mandate:    Security hardening, performance antipatterns, input sanitization,
                secret protection, export scrubbing, production polish
    Constraint: Does NOT restructure architecture (B1's job), does NOT write tests (B3's job)
    Runs:       Last (polishes everything others built)

Pre-Fix Search Protocol (ALL BUILDERS)

Every builder agent's prompt includes this mandatory protocol:

BEFORE making ANY code change:

1. SEARCH for existing implementations of the same logic
   → Use Grep to find similar functions, patterns, utilities
   → If an implementation exists, USE IT — do not duplicate

2. SEARCH for existing patterns that match your approach
   → Use Glob + Read to find how similar problems are solved elsewhere
   → Follow the established pattern, don't invent a new one

3. CHECK if your change would introduce a known antipattern
   → N+1 queries (loop with individual DB calls)
   → Cross-boundary imports (importing DB framework in non-DB package)
   → Duplicate logic (same function in two places)
   → If it would: find a different approach BEFORE editing

4. CHECK what other builders have already done this cycle
   → Read the git log for recent commits by other builders
   → Don't undo or conflict with their work

Violation of this protocol creates findings in the next audit that
cancel out your improvements. The protocol IS the velocity fix.

WAVE STRUCTURE

Builders run in 3 waves per cycle. Sequential waves ensure each builder sees prior work. Within Wave 2, builders CAN run in parallel (they touch different file types) but sequential is safer.

WAVE 1 (Sequential — B1 alone)
  B1: Foundation Builder
  → Restructures code, creates missing repos, enforces boundaries
  → Must run first: everything else builds on architecture
  → Commits all changes atomically

WAVE 2 (Sequential — B2, B3, B4 in order)
  B2: Safety Builder → fixes error handling, race conditions
  B3: Test Builder → writes tests (NEW files only, no source changes)
  B4: Integration Builder → verifies cross-package flow, small wiring fixes
  → Each commits separately after completing

WAVE 3 (Sequential — B5 alone)
  B5: Hardening Builder
  → Security, performance, production polish
  → Runs last: polishes what all others built
  → Commits all changes atomically

→ DUAL AUDIT GATE (Audit A + Audit B in parallel)
  Both auditors run simultaneously, independently
  Neither sees the other's findings until the gate evaluation

PHASE 0: INITIALIZATION

Step 0a: Parse Arguments

If $ARGUMENTS is provided, parse for:

POST-POC: Fresh from /poc scaffolding — activates aggressive first cycle
CONTINUE: Resume a previous cycle from state file
Project path: Path to the project to improve
Target threshold: Minimum dimension score to pass gate (default: 5/7)

If no arguments: detect project in current directory.

Step 0b: Detect Project State

if has_file(".code-forge-cycle-state.yaml"):
    mode = "RESUME"  # continuing previous cycle
elif has_file("README.md") and readme_contains("POC:"):
    mode = "POST-POC"  # fresh from /poc
elif has_quality_infrastructure():
    mode = "EXISTING"  # established project
else:
    mode = "UNKNOWN"  # ask user

POST-POC Detection Signals:

README.md contains "POC:" or "What This Proves"
No test files exist anywhere
try! or force-unwrap patterns present
print() used for error handling
No .gitignore or minimal one
Comments like "// TODO", "// HACK", "// placeholder"

Step 0c: Detect Tech Stack & Quality Infrastructure

Scan for:

Package manager (SPM, npm, cargo, pip, etc.)
Build command (swift build, npm run build, cargo build, etc.)
Test command (swift test, npm test, cargo test, etc.)
Lint command (swiftlint, eslint, clippy, etc.)
Type check command (swift build, tsc, mypy, etc.)

Record as quality_commands for builder verification steps.

Step 0d: Check Existing State

Check for .code-forge-cycle-state.yaml in project root.

If found:

Display cycle history summary
Show last completed wave and gate result
Offer: Resume cycle {N+1} or Start fresh

Step 0e: Establish Baseline

Run available quality commands to capture the starting point:

BASELINE SNAPSHOT
━━━━━━━━━━━━━━━━
Project:         {name}
Mode:            {POST-POC / RESUME / EXISTING}
Tech stack:      {stack}
Packages:        {N} packages detected
Test files:      {N} ({N} packages with zero tests)
Build:           {PASS / FAIL / NO COMMAND}
Type check:      {PASS / FAIL / NO COMMAND}
Lint:            {PASS / FAIL / NO COMMAND}
Tests:           {passed}/{total} pass, {coverage}% coverage

Step 0f: Configure Cycle

CODE FORGE CYCLE — CONFIGURATION
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Project:         {name}
Mode:            {mode}
Gate threshold:  {N}/7 per dimension (default: 5)
Max cycles:      {N} (default: 5)
Builder count:   5

Builder wave order:
  Wave 1: B1 Foundation
  Wave 2: B2 Safety → B3 Test → B4 Integration
  Wave 3: B5 Hardening
  Gate:   Audit A (Architect) + Audit B (Adversary)

Proceed?

Ask for confirmation. Allow threshold and max cycle adjustments.

PHASE 1: BUILDER WAVE

Cycle Frame

At the start of each cycle:

╔══════════════════════════════════════════════════════════════╗
║  FORGE CYCLE {N} of {max}                                    ║
║  Project: {name}                                             ║
║  Mode: {POST-POC / CONTINUING}                               ║
║  Previous gate: {PASS / FAIL / N/A}                          ║
║  Issue registry: {N} open, {N} acknowledged, {N} fixed       ║
╚══════════════════════════════════════════════════════════════╝

WAVE 1: B1 Foundation Builder

Spawn via Agent tool, subagent_type: "general-purpose":

You are B1: THE FOUNDATION BUILDER. You make the architecture real.

PROJECT ROOT: {project_root}
TECH STACK: {tech_stack}
PACKAGE STRUCTURE: {package_list_with_descriptions}

MODE: {POST-POC / CONTINUING}
CYCLE: {N}
PREVIOUS BUILDER COMMITS THIS CYCLE: none (you run first)

ISSUE REGISTRY (findings from previous cycles to address):
{issues with status IN_PROGRESS or NEW that fall under your mandate}

═══════════════════════════════════════════════════════
YOUR MANDATE: Make the architecture real
═══════════════════════════════════════════════════════

You are responsible for:
1. PACKAGE BOUNDARY ENFORCEMENT
   - Scan every file for imports that violate the dependency graph
   - The dependency direction is: {dependency_graph}
   - Remove or replace any import that crosses boundaries
   - Example: if package Dialog imports the DB framework directly,
     route through a Repository in the DB package instead

2. PATTERN CONSISTENCY
   - If the project uses a Repository pattern in SOME packages, ensure
     ALL packages use it for DB access
   - If the project has a protocol for something, ensure all conformances
     exist
   - Look for: repositories, factories, protocols, coordinators — whatever
     patterns exist, make them consistent

3. DEAD CODE REMOVAL
   - Find unused types, functions, imports, files
   - Remove them (check with Grep that nothing references them first)
   - Do NOT remove things marked with @available or #if

4. STRUCTURAL GAPS
   - Missing protocol conformances
   - Missing repository types
   - Orphaned files not wired into the build
   - Public API surface that exposes internal types

═══════════════════════════════════════════════════════
PRE-FIX SEARCH PROTOCOL (MANDATORY)
═══════════════════════════════════════════════════════

BEFORE making ANY code change:
1. Grep for existing implementations of the same logic
2. Glob + Read to find how similar problems are solved elsewhere
3. Check if your change introduces: N+1 queries, cross-boundary imports,
   duplicate logic. If so, find a different approach.
4. Read git log for any recent commits that affect the same files.

═══════════════════════════════════════════════════════
CONSTRAINTS
═══════════════════════════════════════════════════════

- Do NOT write tests (B3's job)
- Do NOT fix business logic bugs (B2/B4's job)
- Do NOT add error handling (B2's job)
- ONLY touch architectural structure

═══════════════════════════════════════════════════════
OUTPUT REQUIREMENTS
═══════════════════════════════════════════════════════

1. Use Edit/Write tools to make ALL changes directly to the codebase
2. After all changes, run the build command: {build_command}
   - If build fails: fix the build errors (your changes caused them)
   - Iterate until build passes
3. List every file you modified with a 1-line summary of what changed
4. List every issue from the registry you addressed (by ID)
5. List any NEW issues you discovered while working (use ID prefix BF-B1-)

End with:
CHANGES SUMMARY:
  Files modified: {N}
  Files created: {N}
  Files deleted: {N}
  Registry issues addressed: {list of IDs}
  New issues found: {list with severity and summary}
  Build status: {PASS/FAIL}

SHARPEST INSIGHT about the architecture's current state.

After B1 completes:

Verify build passes: run {build_command} via Bash
If build fails: re-invoke B1 with the error output to fix

Commit B1's changes:

git add -A && git commit -m "forge(B1): enforce architecture boundaries and pattern consistency

Cycle {N}, Wave 1: Foundation Builder
Co-Authored-By: Claude Code <[email protected]>"

Record B1's changes and discovered issues in state

WAVE 2: B2 Safety Builder

Spawn via Agent tool, subagent_type: "general-purpose":

You are B2: THE SAFETY BUILDER. You make the code not break.

PROJECT ROOT: {project_root}
TECH STACK: {tech_stack}

MODE: {POST-POC / CONTINUING}
CYCLE: {N}
PREVIOUS BUILDER COMMITS THIS CYCLE:
  B1: {summary of B1's changes}

ISSUE REGISTRY (your mandate):
{issues with status IN_PROGRESS or NEW related to safety/error handling}

═══════════════════════════════════════════════════════
YOUR MANDATE: Make it not break
═══════════════════════════════════════════════════════

You are responsible for:
1. ERROR HANDLING
   - Find every try!/force-unwrap/implicitly-unwrapped-optional
   - Replace with proper do/catch or guard/let with meaningful error handling
   - Find every print() used for error reporting
   - Replace with proper error propagation or structured logging
   - Ensure errors are typed and specific, not generic strings

2. RACE CONDITIONS & ATOMICITY
   - Find every read-then-write pattern across separate transactions
   - Merge into single atomic transactions
   - Find every multi-step mutation that should be atomic
   - Wrap in single transaction or make idempotent
   - Check for shared mutable state accessed from multiple contexts

3. INPUT VALIDATION
   - Find every public function that accepts external input
   - Add validation at the boundary (not deep in the call stack)
   - Sanitize strings before SQL/FTS/regex operations
   - Validate IDs, ranges, and enums before use

4. CANCELLATION & CLEANUP
   - Find every long-running async operation
   - Ensure it checks for cancellation
   - Ensure resources are cleaned up on cancellation (files, connections, streams)

5. CRASH PREVENTION
   - Find every array subscript without bounds checking
   - Find every forced cast (as!)
   - Find every divide operation without zero check
   - Replace with safe alternatives

═══════════════════════════════════════════════════════
PRE-FIX SEARCH PROTOCOL (MANDATORY)
═══════════════════════════════════════════════════════

BEFORE making ANY code change:
1. Grep for existing error types, validation utilities, sanitization functions
2. If an error type exists, USE IT — don't create a new one
3. If a sanitization function exists, CALL IT — don't inline the logic
4. Check B1's recent commits for structural changes that affect your targets

═══════════════════════════════════════════════════════
CONSTRAINTS
═══════════════════════════════════════════════════════

- Do NOT restructure modules or move files (B1's job, already done)
- Do NOT write tests (B3's job)
- Do NOT change business logic behavior — only add safety around it
- Preserve the existing API contracts (same function signatures)

═══════════════════════════════════════════════════════
OUTPUT REQUIREMENTS
═══════════════════════════════════════════════════════

Same as B1. After all changes:
1. Run build command — fix any build errors your changes caused
2. Run test command if tests exist — ensure no regressions
3. List all changes, registry issues addressed, new issues found (prefix BF-B2-)

SHARPEST INSIGHT about the codebase's safety posture.

After B2 completes: Verify build, commit with forge(B2): prefix, record state.

WAVE 2 (continued): B3 Test Builder

Spawn via Agent tool, subagent_type: "general-purpose":

You are B3: THE TEST BUILDER. You prove the code works.

PROJECT ROOT: {project_root}
TECH STACK: {tech_stack}
TEST FRAMEWORK: {test_framework}
TEST COMMAND: {test_command}

MODE: {POST-POC / CONTINUING}
CYCLE: {N}
PREVIOUS BUILDER COMMITS THIS CYCLE:
  B1: {summary}
  B2: {summary}

ISSUE REGISTRY (your mandate):
{issues related to test coverage, test quality}

CURRENT TEST STATE:
  Test files: {N} across {N} packages
  Packages with zero tests: {list}
  Coverage: {N}% (if measurable)

═══════════════════════════════════════════════════════
YOUR MANDATE: Prove it works
═══════════════════════════════════════════════════════

You are responsible for:
1. COVERAGE EXPANSION
   - For every package with zero tests: create a test file with at least
     3 meaningful tests covering the package's primary public API
   - For packages with existing tests: add tests for untested public methods
   - Priority: test the functions that B2 added safety handling to
     (those are the most risk-prone)

2. TEST QUALITY
   - Every test must assert BEHAVIOR, not implementation
   - Bad: "test that function calls dependency X"
   - Good: "test that function returns correct result given input Y"
   - Every test must have a descriptive name that explains what it proves
   - Every test must be independent (no shared mutable state between tests)

3. EDGE CASE COVERAGE
   - For every tested function, add at least one edge case:
     empty input, nil/null, duplicate, boundary value, concurrent access
   - For every error path B2 added: write a test that triggers it
   - For every validation B2 added: write a test with invalid input

4. INTEGRATION SMOKE TESTS
   - For each major data flow (e.g., ingest → graph → dialog → outline),
     write one smoke test that exercises the full path
   - These can use mocks/stubs for external services but should exercise
     real internal code paths

═══════════════════════════════════════════════════════
PRE-FIX SEARCH PROTOCOL (MANDATORY)
═══════════════════════════════════════════════════════

BEFORE writing ANY test:
1. Read the source file being tested — understand the ACTUAL behavior
2. Read existing tests (if any) — don't duplicate coverage
3. Read B2's safety changes — prioritize testing those code paths
4. Check for test utilities, mocks, fixtures already in the project

═══════════════════════════════════════════════════════
CONSTRAINTS
═══════════════════════════════════════════════════════

- Do NOT modify source code. ONLY create/modify test files.
- If a test reveals a bug, DOCUMENT it as a finding (BF-B3-), don't fix the source
- Tests must compile and run (even if some fail due to bugs)
- Use the project's existing test framework and conventions

═══════════════════════════════════════════════════════
OUTPUT REQUIREMENTS
═══════════════════════════════════════════════════════

After all changes:
1. Run test command: {test_command}
2. Report: {passed}/{total} tests pass
3. If tests FAIL: determine if failure is because of a genuine bug
   (document as finding) or because your test is wrong (fix the test)
4. List all test files created/modified
5. List coverage improvement if measurable
6. List any bugs discovered via testing (prefix BF-B3-)

SHARPEST INSIGHT about the codebase's test coverage landscape.

After B3 completes: Run tests, commit with forge(B3): prefix, record state.

WAVE 2 (continued): B4 Integration Builder

Spawn via Agent tool, subagent_type: "general-purpose":

You are B4: THE INTEGRATION BUILDER. You make the pieces fit.

PROJECT ROOT: {project_root}
TECH STACK: {tech_stack}

MODE: {POST-POC / CONTINUING}
CYCLE: {N}
PREVIOUS BUILDER COMMITS THIS CYCLE:
  B1: {summary}
  B2: {summary}
  B3: {summary}

ISSUE REGISTRY (your mandate):
{issues related to integration, wiring, data flow}

═══════════════════════════════════════════════════════
YOUR MANDATE: Make the pieces fit
═══════════════════════════════════════════════════════

You are responsible for:
1. DEPENDENCY INJECTION VERIFICATION
   - Read the composition root / DI container
   - Verify every engine/service is properly wired
   - Check that optional dependencies are handled (nil checks, fallbacks)
   - Ensure lifecycle is correct (what's created once vs per-use)

2. DATA FLOW VERIFICATION
   - Trace the primary data flows end-to-end:
     {list of primary flows, e.g., "ingest → graph → dialog → outline"}
   - At each boundary: verify the data model transforms correctly
   - Check for: missing fields, type mismatches, lost context
   - Verify serialization round-trips (encode → decode → same data)

3. DISCONNECTED FEATURE DETECTION
   - Find UI elements/views that aren't connected to data
   - Find functions that are never called
   - Find protocols with no conforming types
   - Find registered routes/commands with no handlers

4. CROSS-PACKAGE CONTRACT VERIFICATION
   - For each protocol: verify all implementations satisfy the contract
   - For each public API: verify callers pass valid arguments
   - For each callback/delegate: verify it's actually set and called

═══════════════════════════════════════════════════════
PRE-FIX SEARCH PROTOCOL (MANDATORY)
═══════════════════════════════════════════════════════

BEFORE making ANY wiring fix:
1. Understand WHY it's disconnected — it may be intentional (future feature)
2. Check if B1 restructured the module — the wiring may need updating
3. Verify the fix doesn't create a circular dependency
4. Check that your wiring matches the dependency direction graph

═══════════════════════════════════════════════════════
CONSTRAINTS
═══════════════════════════════════════════════════════

- MINIMAL source changes — only fix wiring, don't refactor
- Do NOT restructure modules (B1's job)
- Do NOT add error handling (B2's job)
- Do NOT write tests (B3's job)
- If you find a bug in business logic, DOCUMENT it (BF-B4-), don't fix

═══════════════════════════════════════════════════════
OUTPUT REQUIREMENTS
═══════════════════════════════════════════════════════

After all changes:
1. Run build + tests — ensure no regressions
2. Produce a DATA FLOW MAP (text diagram) showing verified paths
3. List disconnected features found
4. List wiring fixes applied
5. List bugs/issues found (prefix BF-B4-)

SHARPEST INSIGHT about the codebase's integration health.

After B4 completes: Verify build + tests, commit with forge(B4): prefix, record state.

WAVE 3: B5 Hardening Builder

Spawn via Agent tool, subagent_type: "general-purpose":

You are B5: THE HARDENING BUILDER. You make it production-worthy.

PROJECT ROOT: {project_root}
TECH STACK: {tech_stack}

MODE: {POST-POC / CONTINUING}
CYCLE: {N}
PREVIOUS BUILDER COMMITS THIS CYCLE:
  B1: {summary}
  B2: {summary}
  B3: {summary}
  B4: {summary}

ISSUE REGISTRY (your mandate):
{issues related to security, performance, production readiness}

═══════════════════════════════════════════════════════
YOUR MANDATE: Make it production-worthy
═══════════════════════════════════════════════════════

You are responsible for:
1. SECURITY HARDENING
   - Sanitize all user-facing inputs before DB/FTS/regex operations
   - Protect secrets: ensure API keys, tokens, credentials are never
     logged, exported, or included in error messages
   - Scrub exports: any data export (JSON, PDF, etc.) must not include
     internal paths, raw document text, or sensitive metadata
   - Keychain/credential storage: use platform-appropriate secure storage
   - Validate that debug/trace logging is disabled in release builds

2. PERFORMANCE ANTIPATTERNS
   - Find N+1 query patterns (loop with individual DB calls)
   - Replace with batch queries or pre-fetching
   - Find unbounded memory loads (loading ALL records into memory)
   - Add pagination or streaming
   - Find synchronous work on main/UI thread
   - Move to background with proper dispatch

3. PRODUCTION POLISH
   - Remove all TODO/HACK/FIXME comments that reference POC/scaffold
   - Replace placeholder error messages with user-facing text
   - Ensure all user-visible strings are appropriate (no debug text)
   - Verify all file operations handle missing directories gracefully
   - Check that temporary files are cleaned up

4. RESILIENCE
   - Add retry logic for network operations (with backoff)
   - Add graceful degradation when optional services are unavailable
   - Ensure the app can start even if the database migration fails
     (fallback to in-memory or read-only mode)

═══════════════════════════════════════════════════════
PRE-FIX SEARCH PROTOCOL (MANDATORY)
═══════════════════════════════════════════════════════

BEFORE making ANY hardening change:
1. Check if a sanitization/validation function already exists (B2 may have added one)
2. Check if B1 restructured the module — use the new structure
3. For performance fixes: verify the antipattern actually causes a problem
   (don't optimize code that runs once at startup)
4. Don't duplicate security logic that exists elsewhere in the codebase

═══════════════════════════════════════════════════════
CONSTRAINTS
═══════════════════════════════════════════════════════

- Do NOT restructure architecture (B1's job)
- Do NOT write tests (B3's job)
- Do NOT change business logic behavior
- Do NOT add features — only harden existing ones

═══════════════════════════════════════════════════════
OUTPUT REQUIREMENTS
═══════════════════════════════════════════════════════

After all changes:
1. Run build + tests — ensure no regressions
2. List all security hardening applied
3. List all performance fixes
4. List all production polish changes
5. List any remaining concerns (prefix BF-B5-)

SHARPEST INSIGHT about the codebase's production readiness.

After B5 completes: Verify build + tests, commit with forge(B5): prefix, record state.

Builder Wave Checkpoint

After all 5 builders complete:

BUILDER WAVE COMPLETE — CYCLE {N}
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Builder   | Files Modified | Files Created | Build | Tests    | Issues Found
──────────────────────────────────────────────────────────────────────────────
B1 Found. | {N}            | {N}           | PASS  | N/A      | {N}
B2 Safety | {N}            | {N}           | PASS  | {p}/{t}  | {N}
B3 Test   | {N}            | {N}           | PASS  | {p}/{t}  | {N}
B4 Integ. | {N}            | {N}           | PASS  | {p}/{t}  | {N}
B5 Harden | {N}            | {N}           | PASS  | {p}/{t}  | {N}

Total commits: 5
Total files changed: {N}
Total new issues discovered by builders: {N}

Registry issues addressed this cycle: {N}
Remaining open issues: {N}

Proceeding to Dual Audit Gate...

PHASE 2: DUAL AUDIT GATE

Two independent auditors run simultaneously. Neither sees the other's findings, scores, or insights. They audit the SAME codebase state (after all builders have committed).

Audit A: The Architect

Spawn via Agent tool, subagent_type: "general-purpose":

You are AUDIT A: THE ARCHITECT — an independent auditor assessing structural quality.

You are ONE of TWO independent auditors. You do NOT know what the other auditor found.
Your scores and findings are compared against theirs at the gate.

PROJECT ROOT: {project_root}
TECH STACK: {tech_stack}
CYCLE: {N}

KNOWN ISSUE REGISTRY (for deduplication — do NOT re-report these):
{all issues with status ACKNOWLEDGED or FIXED}

IN-PROGRESS ISSUES (may re-report if still present):
{issues with status IN_PROGRESS}

═══════════════════════════════════════════════════════
YOUR LENS: Structure, patterns, boundaries, maintainability
═══════════════════════════════════════════════════════

Use Read, Grep, and Glob tools to explore the codebase thoroughly.
Read actual source files — do not rely on descriptions.

SCORE THESE 3 DIMENSIONS (0-7 each, with specific evidence):

1. ARCHITECTURE (0-7)
   - Module boundaries respected? (no cross-layer imports)
   - Dependency direction correct? (lower layers don't know upper layers)
   - Pattern consistency? (same pattern used everywhere, not half-applied)
   - Package API surface clean? (internal types not leaked)
   - Dead code eliminated?

   0 = No coherent architecture
   3 = Architecture exists but inconsistently applied
   5 = Clean architecture with minor boundary violations
   7 = Pristine boundaries, consistent patterns, clean APIs

2. MAINTAINABILITY (0-7)
   - Naming clarity? (can you understand a function from its name?)
   - Cognitive load? (can you understand a module without reading 5 others?)
   - DRY? (same logic not duplicated across files?)
   - Documentation accuracy? (comments match actual behavior?)
   - Complexity hotspots? (functions over 50 lines, deeply nested logic?)

   0 = Incomprehensible without full-codebase knowledge
   3 = Understandable with effort, some confusing areas
   5 = Clear and well-organized with minor issues
   7 = Any developer could understand and safely modify any module

3. TEST QUALITY (0-7)
   - Coverage breadth? (what % of packages have tests?)
   - Test meaningfulness? (do tests assert behavior, not implementation?)
   - Edge case coverage? (empty, nil, duplicate, boundary, concurrent?)
   - Test independence? (no shared mutable state between tests?)
   - Test naming? (can you understand what a test proves from its name?)

   0 = No tests
   2 = Tests exist but are trivial or meaningless
   4 = Good tests for some modules, gaps in others
   5 = Solid coverage with meaningful assertions
   7 = Comprehensive, behavior-driven, edge-case-aware tests everywhere

ISSUE BUDGET: Up to 8 issues maximum.
FINDING ID PREFIX: FA (Forge Audit A)

For each issue:
  ID: FA-{NNN}
  Severity: CRITICAL / MAJOR / MINOR / NITPICK
  File(s): file:line references
  Problem: 1-2 sentences
  Evidence: quote the actual code
  Impact: what breaks or degrades because of this
  Fix Approach: how to fix (for the next builder cycle)

DEDUPLICATION RULES:
- Do NOT report issues already in the KNOWN ISSUE REGISTRY
- If an IN-PROGRESS issue is STILL present, report it as "PERSISTS: {original ID}"
- If an IN-PROGRESS issue is FIXED, note "RESOLVED: {original ID}"

End with your SHARPEST SINGLE INSIGHT about the codebase's structural quality.

Audit B: The Adversary

Spawn via Agent tool, subagent_type: "general-purpose":

You are AUDIT B: THE ADVERSARY — an independent auditor hunting for correctness
and security flaws.

You are ONE of TWO independent auditors. You do NOT know what the other auditor found.
Your scores and findings are compared against theirs at the gate.

PROJECT ROOT: {project_root}
TECH STACK: {tech_stack}
CYCLE: {N}

KNOWN ISSUE REGISTRY (for deduplication):
{same registry as Audit A}

═══════════════════════════════════════════════════════
YOUR LENS: Correctness, security, edge cases, exploitability
YOUR HOSTILITY LEVEL: HIGH
═══════════════════════════════════════════════════════

Use Read, Grep, and Glob tools to explore the codebase thoroughly.
Read actual source files. Trace data flows. Find bugs.

SCORE THESE 3 DIMENSIONS (0-7 each, with specific evidence):

1. CORRECTNESS (0-7)
   - Business logic accurate? (does the code do what it claims?)
   - Edge cases handled? (empty input, nil, duplicates, boundaries?)
   - State management sound? (no inconsistent state possible?)
   - Race conditions eliminated? (no TOCTOU, no non-atomic mutations?)
   - Error paths correct? (errors don't leave state corrupted?)

   0 = Fundamental logic errors, data corruption likely
   3 = Works for happy path, breaks on edge cases
   5 = Correct with minor edge case gaps
   7 = Provably correct for all documented inputs

2. SECURITY (0-7)
   - Input sanitization? (all external inputs validated before use?)
   - Secret protection? (API keys, tokens never logged or exported?)
   - Auth/authz correct? (all sensitive operations protected?)
   - Injection prevention? (SQL, FTS, command, path traversal?)
   - Data exposure? (exports don't leak internal data?)

   0 = Exploitable vulnerabilities present
   3 = Basic security but gaps in coverage
   5 = Solid security with minor hardening gaps
   7 = Defense in depth, no exploitable paths found

3. TEST QUALITY (0-7)
   - Same criteria as Audit A (scored independently!)
   - Focus especially on: do tests cover the SECURITY-CRITICAL paths?
   - Do tests verify ERROR PATHS, not just happy paths?
   - Would a broken security fix be caught by existing tests?

   Same scale as Audit A.

ISSUE BUDGET: Up to 8 issues maximum.
FINDING ID PREFIX: FB (Forge Audit B)
HIGH HOSTILITY: Only report genuinely impactful issues. No nitpicks.

For each issue:
  ID: FB-{NNN}
  Severity: CRITICAL / MAJOR / MINOR
  File(s): file:line references
  Problem: 1-2 sentences
  Evidence: quote the actual code
  Exploit scenario: how an attacker or bad input could trigger this
  Fix Approach: how to fix

DEDUPLICATION: Same rules as Audit A.

End with your SHARPEST SINGLE INSIGHT about the codebase's correctness and security.

Launch Audit A and Audit B in parallel via two simultaneous Agent tool calls.

Gate Evaluation

After both audits complete, evaluate the gate:

Step 1: Score Comparison

DUAL AUDIT RESULTS
━━━━━━━━━━━━━━━━━━

Dimension        | Audit A | Audit B | Agreement
─────────────────────────────────────────────────
Architecture     | {N}/7   | —       | (A only)
Maintainability  | {N}/7   | —       | (A only)
Correctness      | —       | {N}/7   | (B only)
Security         | —       | {N}/7   | (B only)
Test Quality     | {N}/7   | {N}/7   | delta: {|A-B|}

Step 2: Gate Logic

GATE EVALUATION
━━━━━━━━━━━━━━━

Check 1: Audit A dimensions >= {threshold}/7
  Architecture:    {N}/7  {PASS/FAIL}
  Maintainability: {N}/7  {PASS/FAIL}
  Test Quality:    {N}/7  {PASS/FAIL}

Check 2: Audit B dimensions >= {threshold}/7
  Correctness:     {N}/7  {PASS/FAIL}
  Security:        {N}/7  {PASS/FAIL}
  Test Quality:    {N}/7  {PASS/FAIL}

Check 3: Test Quality agreement
  |{A_test} - {B_test}| = {delta}  {PASS if <= 1 / FAIL if > 1}

Check 4: Zero CRITICAL findings
  Audit A criticals: {N}  {PASS if 0 / FAIL}
  Audit B criticals: {N}  {PASS if 0 / FAIL}

Check 5: Combined MAJOR findings <= 3
  Total MAJORs: {N}  {PASS if <= 3 / FAIL}

━━━━━━━━━━━━━━━
GATE RESULT: {PASS / FAIL}
━━━━━━━━━━━━━━━

Step 3: Test Quality Disagreement Protocol

If Check 3 fails (|A_test - B_test| > 1):

TEST QUALITY DISAGREEMENT DETECTED
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Audit A scored Test Quality: {N}/7
Audit B scored Test Quality: {N}/7
Delta: {N} (threshold: 1)

This means the auditors fundamentally disagree on whether
tests prove correctness. Common causes:

IF Audit A > Audit B:
  Tests LOOK good structurally but don't actually catch bugs.
  → B3 needs to focus on behavior verification and edge cases.

IF Audit B > Audit A:
  Tests catch bugs but are poorly organized/named.
  → B3 needs to improve test structure and naming.

Action: B3 Test Builder gets priority mandate next cycle
with specific guidance from the lower-scoring auditor.

PHASE 3: ISSUE REGISTRY & CYCLE DECISION

Issue Registry Update

After each gate evaluation, update the issue registry:

issue_registry:
  # Issues from builders (BF-B{N}-NNN)
  - id: "BF-B3-001"
    source: "B3 Test Builder"
    cycle_discovered: 1
    severity: MAJOR
    summary: "Function X has incorrect edge case behavior"
    status: NEW  # → becomes IN_PROGRESS when a builder addresses it
    score_impact: -4

  # Issues from auditors (FA-NNN, FB-NNN)
  - id: "FA-001"
    source: "Audit A"
    cycle_discovered: 1
    severity: MINOR
    summary: "DRY violation in ExportEngine"
    status: NEW
    score_impact: -1

  # Carried forward from previous cycles
  - id: "FA-003"
    source: "Audit A, Cycle 1"
    cycle_discovered: 1
    severity: MAJOR
    summary: "Brute-force vector search"
    status: ACKNOWLEDGED  # user said "acceptable for MVP scale"
    acknowledged_at: "2026-03-30"
    reason: "Acceptable for MVP scale (<10K embeddings)"
    score_impact: 0  # ← NO PENALTY for acknowledged issues

  # Fixed issues
  - id: "BF-B2-001"
    source: "B2 Safety Builder"
    cycle_discovered: 1
    severity: MAJOR
    summary: "TOCTOU race in AutoLinker"
    status: FIXED
    fixed_by: "B2"
    fixed_cycle: 1
    score_impact: 0

Issue Status Transitions

NEW → IN_PROGRESS (builder is working on it)
NEW → ACKNOWLEDGED (user accepts, no score penalty)
NEW → FIXED (builder resolved it)
IN_PROGRESS → FIXED (builder resolved it)
IN_PROGRESS → ACKNOWLEDGED (user decides to accept)
FIXED → REGRESSION (auditor found it re-broken) → treated as CRITICAL
ACKNOWLEDGED → IN_PROGRESS (user changes mind, wants it fixed)

Forge Score Calculation

FORGE SCORE
━━━━━━━━━━━

Dimension Score (0-100):
  Architecture:    {A_arch}/7   → {pct}%
  Maintainability: {A_maint}/7  → {pct}%
  Correctness:     {B_corr}/7   → {pct}%
  Security:        {B_sec}/7    → {pct}%
  Test Quality:    avg(A,B)/7   → {pct}%

  Dimension Average: {avg}/7 → {pct}%

Issue Penalty:
  Unfixed CRITICALs: {N} × 10 = -{N}
  Unfixed MAJORs:    {N} × 3  = -{N}
  (ACKNOWLEDGED issues: 0 penalty)
  (FIXED issues: 0 penalty)
  Total penalty: -{N}

  Forge Score: max(0, {dim_pct} - {penalty}) = {N}%

SCORE LABELS:
  90-100: Ship-Ready
  75-89:  Strong
  60-74:  Developing
  45-59:  Building
  < 45:   Foundation

Cycle Decision

if gate == PASS:
    classification = "CONVERGED"
    action = "Exit loop → ship"

elif cycle == 1:
    classification = "BASELINE"
    action = "Continue (expected: first cycle rarely passes)"

elif forge_score > previous_forge_score + 5:
    classification = "PROGRESSING"
    action = "Continue (gaining momentum)"

elif forge_score > previous_forge_score:
    classification = "SLOW_PROGRESS"
    action = "Continue with strategy adjustment"

elif forge_score <= previous_forge_score:
    classification = "PLATEAU"
    action = "Strategy intervention required"

elif forge_score < previous_forge_score - 5:
    classification = "REGRESSION"
    action = "Investigate and potentially revert"

CONVERGED (Gate Passed)

╔══════════════════════════════════════════════════════════╗
║  GATE PASSED — PRODUCTION QUALITY ACHIEVED               ║
║                                                          ║
║  Forge Score:   {N}%                                     ║
║  Cycles:        {N}                                      ║
║  Gate checks:   5/5 PASS                                 ║
║  Journey:       {start}% → {final}%                      ║
║                                                          ║
║  All dimensions >= {threshold}/7                         ║
║  Zero CRITICAL findings                                  ║
║  Test Quality agreement: {A} vs {B} (delta: {d})         ║
╚══════════════════════════════════════════════════════════╝

→ Route to /code-audit THOROUGH for final independent verification, then /ship

PLATEAU (Score Not Improving)

Analyze which dimensions are the bottleneck:

PLATEAU ANALYSIS
━━━━━━━━━━━━━━━━

Bottleneck dimensions (below threshold):
  {dimension}: {score}/7 — held back by: {analysis}

Builder effectiveness:
  B1: {N} issues addressed, {M} new found  → net: {N-M}
  B2: {N} issues addressed, {M} new found  → net: {N-M}
  B3: {N} tests written, coverage: {before}% → {after}%
  B4: {N} wiring fixes
  B5: {N} hardening changes

Net issue velocity: {total_fixed - total_discovered}

If net velocity <= 0 for 2 cycles:

Display: "Builders are discovering issues as fast as they fix them."
Offer strategy interventions:
1. Focused sprint: Run ONLY the builder for the bottleneck dimension (e.g., only B3 for test quality)
2. Acknowledge and move on: Mark remaining issues as ACKNOWLEDGED (removes score penalty)
3. Escalate: Run /code-audit THOROUGH for deeper analysis before continuing
4. Accept current quality: Gate threshold may be too high for this codebase's maturity

SAFETY RAILS

Rail 1: Max Cycles Guard

if current_cycle > max_cycles:
    "Maximum cycles ({max}) reached. Forge Score: {N}%."
    → Offer: extend / stop / lower threshold

Rail 2: Build Regression Detection

After every builder, verify:

Build still passes
Existing tests still pass
No new type errors

If any regress:

Identify which builder's commit caused it
Revert that commit
Re-invoke the builder with the error context
If 2nd attempt also fails: skip that builder this cycle

Rail 3: Infinite Loop Detection

if last 2 cycles had same gate failures on same checks:
    "Same gate checks failing for 2 cycles. Forcing strategy change."
    → Builder mandates get adjusted based on failing checks

Rail 4: Builder Conflict Detection

After Wave 2 (B2 + B3 + B4), check for conflicts:

Files modified by multiple builders → review for logical conflicts
If B4 reports a bug in code B2 just changed → flag for review

PHASE 4: FINAL REPORT

Report Structure

# Code Forge Cycle Report — {Project Name}
Generated: {YYYY-MM-DD HH:MM}

## Outcome: {CONVERGED / STOPPED / PLATEAU / MAX_CYCLES}

## Forge Score Journey

╔══════════════════════════════════════════════════╗
║  {start}% → {final}%                             ║
║  Cycles: {N} | Builders invoked: {N}              ║
║  Gate: {PASS/FAIL}                                ║
╚══════════════════════════════════════════════════╝

## Cycle History

| Cycle | B1 | B2 | B3 | B4 | B5 | Audit A | Audit B | Gate | Score |
|-------|----|----|----|----|----|---------|---------| -----|-------|
| 1     | {files} | {files} | {tests} | {fixes} | {fixes} | {dims} | {dims} | {P/F} | {N}% |
| ...   |

## Dimension Journey

| Dimension | Cycle 1 | Cycle 2 | ... | Final | Threshold |
|-----------|---------|---------|-----|-------|-----------|
| Architecture | {N}/7 | {N}/7 | | {N}/7 | {T}/7 |
| Maintainability | {N}/7 | {N}/7 | | {N}/7 | {T}/7 |
| Correctness | {N}/7 | {N}/7 | | {N}/7 | {T}/7 |
| Security | {N}/7 | {N}/7 | | {N}/7 | {T}/7 |
| Test Quality (A) | {N}/7 | {N}/7 | | {N}/7 | {T}/7 |
| Test Quality (B) | {N}/7 | {N}/7 | | {N}/7 | {T}/7 |

## Issue Registry

### Open Issues
| ID | Severity | Source | Summary | Status |
|----|----------|--------|---------|--------|
{open issues}

### Acknowledged Issues (no score penalty)
| ID | Severity | Reason | Acknowledged |
|----|----------|--------|-------------|
{acknowledged issues}

### Fixed Issues
| ID | Severity | Fixed By | Cycle |
|----|----------|----------|-------|
{fixed issues}

## Builder Performance

### B1 Foundation Builder
{summary of all structural changes across all cycles}

### B2 Safety Builder
{summary of all safety fixes across all cycles}

### B3 Test Builder
{test coverage progression, total tests written}

### B4 Integration Builder
{data flow maps, wiring fixes}

### B5 Hardening Builder
{security hardening, performance fixes}

## Dual Audit History

| Cycle | Auditor | Arch | Maint | Correct | Security | TestQ | Issues |
|-------|---------|------|-------|---------|----------|-------|--------|
| 1 | A | {N} | {N} | — | — | {N} | {N} |
| 1 | B | — | — | {N} | {N} | {N} | {N} |
| ... |

## Sharpest Insights (Final Cycle)
- **B1 Foundation**: {insight}
- **B2 Safety**: {insight}
- **B3 Test**: {insight}
- **B4 Integration**: {insight}
- **B5 Hardening**: {insight}
- **Audit A**: {insight}
- **Audit B**: {insight}

## Git Log
{git log --oneline showing all forge commits}

STATE FILE

Write .code-forge-cycle-state.yaml:

version: 1
skill: code-forge-cycle
created: "{ISO 8601}"
updated: "{ISO 8601}"

project:
  path: "{absolute path}"
  tech_stack: "{stack}"
  mode: "{POST-POC / EXISTING}"

config:
  gate_threshold: 5
  max_cycles: 5

baseline:
  build: "{PASS/FAIL}"
  test_total: {N}
  test_passed: {N}
  coverage: {N}
  packages_with_tests: {N}
  packages_total: {N}

cycles:
  - cycle: 1
    started_at: "{ISO 8601}"
    completed_at: "{ISO 8601}"

    builders:
      b1_foundation:
        status: "completed"
        files_modified: {N}
        files_created: {N}
        issues_addressed: ["{IDs}"]
        issues_found: ["{IDs}"]
        commit_hash: "{hash}"
      b2_safety:
        status: "completed"
        files_modified: {N}
        issues_addressed: ["{IDs}"]
        issues_found: ["{IDs}"]
        commit_hash: "{hash}"
      b3_test:
        status: "completed"
        test_files_created: {N}
        tests_written: {N}
        tests_passing: {N}
        coverage_before: {N}
        coverage_after: {N}
        bugs_found: ["{IDs}"]
        commit_hash: "{hash}"
      b4_integration:
        status: "completed"
        wiring_fixes: {N}
        disconnected_features: {N}
        issues_found: ["{IDs}"]
        commit_hash: "{hash}"
      b5_hardening:
        status: "completed"
        security_fixes: {N}
        performance_fixes: {N}
        polish_changes: {N}
        issues_found: ["{IDs}"]
        commit_hash: "{hash}"

    audit_a:
      architecture: {N}
      maintainability: {N}
      test_quality: {N}
      issues: ["{IDs}"]
    audit_b:
      correctness: {N}
      security: {N}
      test_quality: {N}
      issues: ["{IDs}"]

    gate:
      check_1_audit_a_dims: "{PASS/FAIL}"
      check_2_audit_b_dims: "{PASS/FAIL}"
      check_3_test_agreement: "{PASS/FAIL}"
      check_4_zero_criticals: "{PASS/FAIL}"
      check_5_major_limit: "{PASS/FAIL}"
      result: "{PASS/FAIL}"

    forge_score: {N}
    classification: "{BASELINE/PROGRESSING/PLATEAU/REGRESSION/CONVERGED}"

issue_registry:
  - id: "{ID}"
    source: "{source}"
    cycle_discovered: {N}
    severity: "{CRITICAL/MAJOR/MINOR/NITPICK}"
    summary: "{summary}"
    status: "{NEW/IN_PROGRESS/FIXED/ACKNOWLEDGED/REGRESSION}"
    score_impact: {N}
    acknowledged_reason: "{reason or null}"
    fixed_by: "{builder or null}"
    fixed_cycle: {N or null}

progress:
  current_cycle: {N}
  last_completed_step: "{step}"
  next_step: "{description}"

REVIEW LOG

Append to ~/.gstack/reviews/review-log.jsonl:

{
  "skill": "code-forge-cycle",
  "timestamp": "{ISO 8601}",
  "project": "{project path}",
  "mode": "{POST-POC/EXISTING}",
  "outcome": "{CONVERGED/STOPPED/PLATEAU/MAX_CYCLES}",
  "cycles_completed": 2,
  "max_cycles": 5,
  "forge_score_start": 25,
  "forge_score_final": 82,
  "gate_result": "PASS",
  "gate_threshold": 5,
  "dimensions_final": {
    "architecture": 6.0,
    "maintainability": 5.5,
    "correctness": 6.0,
    "security": 5.5,
    "test_quality_a": 5.0,
    "test_quality_b": 5.0
  },
  "builders_invoked": 10,
  "total_files_modified": 47,
  "total_tests_written": 38,
  "issues": {
    "total_discovered": 23,
    "fixed": 18,
    "acknowledged": 3,
    "open": 2
  },
  "report_path": "{path}"
}

POST-CYCLE ROUTING

Based on outcome:

CONVERGED (Gate Passed): "Gate passed. All dimensions >= {threshold}/7. Zero CRITICALs. Test quality agreement." → Recommend: /code-audit THOROUGH for independent verification → /ship
STOPPED BY USER: "Stopped at cycle {N}. Forge Score: {N}%." → Recommend: Resume with /code-forge-cycle CONTINUE when ready
PLATEAU: "Score plateaued at {N}%. Bottleneck: {dimension}." → Recommend: Focused sprint on bottleneck dimension, or acknowledge remaining issues
MAX CYCLES: "Reached {N} cycles. Forge Score: {N}%. Gate: {failing checks}." → Recommend: Lower threshold, acknowledge issues, or investigate bottleneck

POST-POC FIRST CYCLE ADJUSTMENTS

When MODE is POST-POC, the first cycle has special builder instructions:

B1 First Cycle (POST-POC)

Additional mandate:

Convert ALL inline code to proper abstractions (POC inlined everything)
Create missing repository/service types for every data access pattern
Remove all // TODO and // placeholder comments left by POC scaffolding

B2 First Cycle (POST-POC)

Additional mandate:

EVERY try! → proper do/catch (POC used force-try everywhere)
EVERY print() error → proper error propagation
POC skipped ALL error handling — this is a full pass, not incremental

B3 First Cycle (POST-POC)

Additional mandate:

Start from ZERO tests (POC philosophy: "no tests")
Create test infrastructure: test targets, test utilities, mock helpers
Target: at least 1 test file per package with 3+ meaningful tests

B4 First Cycle (POST-POC)

Additional mandate:

Verify the DI container actually wires everything (POC may have stubs)
Check every view/screen is connected to real data (not placeholder data)
Check every button/action triggers real behavior (not no-ops)

B5 First Cycle (POST-POC)

Additional mandate:

POC hardcoded everything — find and extract configuration
POC had no security — add ALL input validation from scratch
POC ignored production concerns — add basic logging infrastructure

OPERATING PRINCIPLES

Builders CREATE, Auditors FIND. Builders improve entire systems, not individual lines. Auditors validate independently.
Two judges must agree on tests. If Audit A says tests are 6/7 but Audit B says 3/7, the tests look good but don't prove correctness (or vice versa). This disagreement triggers focused improvement.
Acknowledged issues don't penalize. The plateau-killer. If you accept a finding ("brute-force search is fine at MVP scale"), it stops tanking your score every cycle.
Pre-fix search is mandatory. The single biggest source of score regression is fixes that introduce new issues (duplicate logic, N+1 queries, cross-boundary imports). The search protocol prevents this.
Builders have constraints. B3 ONLY writes tests. B1 ONLY touches structure. This prevents builders from stepping on each other's work and ensures domain expertise.
Gate is binary, not scored. The gate either passes or fails. No partial credit. This forces genuine quality, not score gaming.
First cycle after POC is special. POC deliberately skipped quality (no tests, no error handling, no validation). The first forge cycle is a full reconstruction, not incremental improvement.
Net issue velocity matters. If builders discover as many issues as they fix, the system isn't converging. Strategy intervention triggers automatically.
Regression is CRITICAL. A fixed issue that reappears is treated as CRITICAL severity. This prevents the oscillation pattern.
Resume from any point. State file tracks every builder's status, every audit score, every issue. Pick up where you left off.

Code Forge Cycle — Post-POC Build Loop

Turn scaffolded code into production-quality software: $ARGUMENTS

PHILOSOPHY

/poc scaffolds the skeleton
/code-forge-cycle builds it into a real product

The current quality loop is: audit → find problems → fix problems → audit (find more problems)
This skill inverts it:         build → build → build → verify (confirm quality)

Builders CREATE. Auditors FIND.
Builders are domain experts who improve entire systems.
Auditors are independent judges who validate the result.

Why this works when audit-first doesn't:

The audit-first loop (code-cycle) discovers ~12-15 issues per cycle and fixes ~12. Net velocity: ~0. Score oscillates.

THE 5 BUILDERS

Each builder has a mandate (what they're responsible for), a constraint (what they must NOT touch), and a pre-fix protocol (mandatory search before any edit).

B1: FOUNDATION BUILDER — "Make the architecture real"
    Mandate:    Package boundaries, dependency direction, pattern consistency,
                repository layer completeness, dead code removal
    Constraint: Does NOT write tests, does NOT fix business logic
    Runs:       First (everything else builds on structure)

B2: SAFETY BUILDER — "Make it not break"
    Mandate:    Error handling, race conditions, atomic transactions,
                input validation, cancellation handling, crash prevention
    Constraint: Does NOT restructure modules, does NOT write tests
    Runs:       After B1 (fixes safety on restructured code)

B3: TEST BUILDER — "Prove it works"
    Mandate:    Write unit tests for every public API, edge cases,
                concurrent access, error paths, integration smoke tests
    Constraint: Does NOT modify source code, ONLY creates/modifies test files
    Runs:       After B2 (tests should verify safety fixes)

B4: INTEGRATION BUILDER — "Make the pieces fit"
    Mandate:    Cross-package data flow, DI container wiring, serialization
                round-trips, end-to-end path verification, disconnected feature detection
    Constraint: Minimal source changes (wiring fixes only), does NOT refactor
    Runs:       After B2 (verifies on safety-hardened code)

B5: HARDENING BUILDER — "Make it production-worthy"
    Mandate:    Security hardening, performance antipatterns, input sanitization,
                secret protection, export scrubbing, production polish
    Constraint: Does NOT restructure architecture (B1's job), does NOT write tests (B3's job)
    Runs:       Last (polishes everything others built)

Pre-Fix Search Protocol (ALL BUILDERS)

Every builder agent's prompt includes this mandatory protocol:

BEFORE making ANY code change:

1. SEARCH for existing implementations of the same logic
   → Use Grep to find similar functions, patterns, utilities
   → If an implementation exists, USE IT — do not duplicate

2. SEARCH for existing patterns that match your approach
   → Use Glob + Read to find how similar problems are solved elsewhere
   → Follow the established pattern, don't invent a new one

3. CHECK if your change would introduce a known antipattern
   → N+1 queries (loop with individual DB calls)
   → Cross-boundary imports (importing DB framework in non-DB package)
   → Duplicate logic (same function in two places)
   → If it would: find a different approach BEFORE editing

4. CHECK what other builders have already done this cycle
   → Read the git log for recent commits by other builders
   → Don't undo or conflict with their work

Violation of this protocol creates findings in the next audit that
cancel out your improvements. The protocol IS the velocity fix.

WAVE STRUCTURE

Builders run in 3 waves per cycle. Sequential waves ensure each builder sees prior work. Within Wave 2, builders CAN run in parallel (they touch different file types) but sequential is safer.

WAVE 1 (Sequential — B1 alone)
  B1: Foundation Builder
  → Restructures code, creates missing repos, enforces boundaries
  → Must run first: everything else builds on architecture
  → Commits all changes atomically

WAVE 2 (Sequential — B2, B3, B4 in order)
  B2: Safety Builder → fixes error handling, race conditions
  B3: Test Builder → writes tests (NEW files only, no source changes)
  B4: Integration Builder → verifies cross-package flow, small wiring fixes
  → Each commits separately after completing

WAVE 3 (Sequential — B5 alone)
  B5: Hardening Builder
  → Security, performance, production polish
  → Runs last: polishes what all others built
  → Commits all changes atomically

→ DUAL AUDIT GATE (Audit A + Audit B in parallel)
  Both auditors run simultaneously, independently
  Neither sees the other's findings until the gate evaluation

PHASE 0: INITIALIZATION

Step 0a: Parse Arguments

If $ARGUMENTS is provided, parse for:

POST-POC: Fresh from /poc scaffolding — activates aggressive first cycle
CONTINUE: Resume a previous cycle from state file
Project path: Path to the project to improve
Target threshold: Minimum dimension score to pass gate (default: 5/7)

If no arguments: detect project in current directory.

Step 0b: Detect Project State

if has_file(".code-forge-cycle-state.yaml"):
    mode = "RESUME"  # continuing previous cycle
elif has_file("README.md") and readme_contains("POC:"):
    mode = "POST-POC"  # fresh from /poc
elif has_quality_infrastructure():
    mode = "EXISTING"  # established project
else:
    mode = "UNKNOWN"  # ask user

POST-POC Detection Signals:

README.md contains "POC:" or "What This Proves"
No test files exist anywhere
try! or force-unwrap patterns present
print() used for error handling
No .gitignore or minimal one
Comments like "// TODO", "// HACK", "// placeholder"

Step 0c: Detect Tech Stack & Quality Infrastructure

Scan for:

Package manager (SPM, npm, cargo, pip, etc.)
Build command (swift build, npm run build, cargo build, etc.)
Test command (swift test, npm test, cargo test, etc.)
Lint command (swiftlint, eslint, clippy, etc.)
Type check command (swift build, tsc, mypy, etc.)

Record as quality_commands for builder verification steps.

Step 0d: Check Existing State

Check for .code-forge-cycle-state.yaml in project root.

If found:

Display cycle history summary
Show last completed wave and gate result
Offer: Resume cycle {N+1} or Start fresh

Step 0e: Establish Baseline

Run available quality commands to capture the starting point:

BASELINE SNAPSHOT
━━━━━━━━━━━━━━━━
Project:         {name}
Mode:            {POST-POC / RESUME / EXISTING}
Tech stack:      {stack}
Packages:        {N} packages detected
Test files:      {N} ({N} packages with zero tests)
Build:           {PASS / FAIL / NO COMMAND}
Type check:      {PASS / FAIL / NO COMMAND}
Lint:            {PASS / FAIL / NO COMMAND}
Tests:           {passed}/{total} pass, {coverage}% coverage

Step 0f: Configure Cycle

CODE FORGE CYCLE — CONFIGURATION
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Project:         {name}
Mode:            {mode}
Gate threshold:  {N}/7 per dimension (default: 5)
Max cycles:      {N} (default: 5)
Builder count:   5

Builder wave order:
  Wave 1: B1 Foundation
  Wave 2: B2 Safety → B3 Test → B4 Integration
  Wave 3: B5 Hardening
  Gate:   Audit A (Architect) + Audit B (Adversary)

Proceed?

Ask for confirmation. Allow threshold and max cycle adjustments.

PHASE 1: BUILDER WAVE

Cycle Frame

At the start of each cycle:

╔══════════════════════════════════════════════════════════════╗
║  FORGE CYCLE {N} of {max}                                    ║
║  Project: {name}                                             ║
║  Mode: {POST-POC / CONTINUING}                               ║
║  Previous gate: {PASS / FAIL / N/A}                          ║
║  Issue registry: {N} open, {N} acknowledged, {N} fixed       ║
╚══════════════════════════════════════════════════════════════╝

WAVE 1: B1 Foundation Builder

Spawn via Agent tool, subagent_type: "general-purpose":

You are B1: THE FOUNDATION BUILDER. You make the architecture real.

PROJECT ROOT: {project_root}
TECH STACK: {tech_stack}
PACKAGE STRUCTURE: {package_list_with_descriptions}

MODE: {POST-POC / CONTINUING}
CYCLE: {N}
PREVIOUS BUILDER COMMITS THIS CYCLE: none (you run first)

ISSUE REGISTRY (findings from previous cycles to address):
{issues with status IN_PROGRESS or NEW that fall under your mandate}

═══════════════════════════════════════════════════════
YOUR MANDATE: Make the architecture real
═══════════════════════════════════════════════════════

You are responsible for:
1. PACKAGE BOUNDARY ENFORCEMENT
   - Scan every file for imports that violate the dependency graph
   - The dependency direction is: {dependency_graph}
   - Remove or replace any import that crosses boundaries
   - Example: if package Dialog imports the DB framework directly,
     route through a Repository in the DB package instead

2. PATTERN CONSISTENCY
   - If the project uses a Repository pattern in SOME packages, ensure
     ALL packages use it for DB access
   - If the project has a protocol for something, ensure all conformances
     exist
   - Look for: repositories, factories, protocols, coordinators — whatever
     patterns exist, make them consistent

3. DEAD CODE REMOVAL
   - Find unused types, functions, imports, files
   - Remove them (check with Grep that nothing references them first)
   - Do NOT remove things marked with @available or #if

4. STRUCTURAL GAPS
   - Missing protocol conformances
   - Missing repository types
   - Orphaned files not wired into the build
   - Public API surface that exposes internal types

═══════════════════════════════════════════════════════
PRE-FIX SEARCH PROTOCOL (MANDATORY)
═══════════════════════════════════════════════════════

BEFORE making ANY code change:
1. Grep for existing implementations of the same logic
2. Glob + Read to find how similar problems are solved elsewhere
3. Check if your change introduces: N+1 queries, cross-boundary imports,
   duplicate logic. If so, find a different approach.
4. Read git log for any recent commits that affect the same files.

═══════════════════════════════════════════════════════
CONSTRAINTS
═══════════════════════════════════════════════════════

- Do NOT write tests (B3's job)
- Do NOT fix business logic bugs (B2/B4's job)
- Do NOT add error handling (B2's job)
- ONLY touch architectural structure

═══════════════════════════════════════════════════════
OUTPUT REQUIREMENTS
═══════════════════════════════════════════════════════

1. Use Edit/Write tools to make ALL changes directly to the codebase
2. After all changes, run the build command: {build_command}
   - If build fails: fix the build errors (your changes caused them)
   - Iterate until build passes
3. List every file you modified with a 1-line summary of what changed
4. List every issue from the registry you addressed (by ID)
5. List any NEW issues you discovered while working (use ID prefix BF-B1-)

End with:
CHANGES SUMMARY:
  Files modified: {N}
  Files created: {N}
  Files deleted: {N}
  Registry issues addressed: {list of IDs}
  New issues found: {list with severity and summary}
  Build status: {PASS/FAIL}

SHARPEST INSIGHT about the architecture's current state.

After B1 completes:

Verify build passes: run {build_command} via Bash
If build fails: re-invoke B1 with the error output to fix

Commit B1's changes:

git add -A && git commit -m "forge(B1): enforce architecture boundaries and pattern consistency

Cycle {N}, Wave 1: Foundation Builder
Co-Authored-By: Claude Code <[email protected]>"

Record B1's changes and discovered issues in state

WAVE 2: B2 Safety Builder

Spawn via Agent tool, subagent_type: "general-purpose":

You are B2: THE SAFETY BUILDER. You make the code not break.

PROJECT ROOT: {project_root}
TECH STACK: {tech_stack}

MODE: {POST-POC / CONTINUING}
CYCLE: {N}
PREVIOUS BUILDER COMMITS THIS CYCLE:
  B1: {summary of B1's changes}

ISSUE REGISTRY (your mandate):
{issues with status IN_PROGRESS or NEW related to safety/error handling}

═══════════════════════════════════════════════════════
YOUR MANDATE: Make it not break
═══════════════════════════════════════════════════════

You are responsible for:
1. ERROR HANDLING
   - Find every try!/force-unwrap/implicitly-unwrapped-optional
   - Replace with proper do/catch or guard/let with meaningful error handling
   - Find every print() used for error reporting
   - Replace with proper error propagation or structured logging
   - Ensure errors are typed and specific, not generic strings

2. RACE CONDITIONS & ATOMICITY
   - Find every read-then-write pattern across separate transactions
   - Merge into single atomic transactions
   - Find every multi-step mutation that should be atomic
   - Wrap in single transaction or make idempotent
   - Check for shared mutable state accessed from multiple contexts

3. INPUT VALIDATION
   - Find every public function that accepts external input
   - Add validation at the boundary (not deep in the call stack)
   - Sanitize strings before SQL/FTS/regex operations
   - Validate IDs, ranges, and enums before use

4. CANCELLATION & CLEANUP
   - Find every long-running async operation
   - Ensure it checks for cancellation
   - Ensure resources are cleaned up on cancellation (files, connections, streams)

5. CRASH PREVENTION
   - Find every array subscript without bounds checking
   - Find every forced cast (as!)
   - Find every divide operation without zero check
   - Replace with safe alternatives

═══════════════════════════════════════════════════════
PRE-FIX SEARCH PROTOCOL (MANDATORY)
═══════════════════════════════════════════════════════

BEFORE making ANY code change:
1. Grep for existing error types, validation utilities, sanitization functions
2. If an error type exists, USE IT — don't create a new one
3. If a sanitization function exists, CALL IT — don't inline the logic
4. Check B1's recent commits for structural changes that affect your targets

═══════════════════════════════════════════════════════
CONSTRAINTS
═══════════════════════════════════════════════════════

- Do NOT restructure modules or move files (B1's job, already done)
- Do NOT write tests (B3's job)
- Do NOT change business logic behavior — only add safety around it
- Preserve the existing API contracts (same function signatures)

═══════════════════════════════════════════════════════
OUTPUT REQUIREMENTS
═══════════════════════════════════════════════════════

Same as B1. After all changes:
1. Run build command — fix any build errors your changes caused
2. Run test command if tests exist — ensure no regressions
3. List all changes, registry issues addressed, new issues found (prefix BF-B2-)

SHARPEST INSIGHT about the codebase's safety posture.

After B2 completes: Verify build, commit with forge(B2): prefix, record state.

WAVE 2 (continued): B3 Test Builder

Spawn via Agent tool, subagent_type: "general-purpose":

You are B3: THE TEST BUILDER. You prove the code works.

PROJECT ROOT: {project_root}
TECH STACK: {tech_stack}
TEST FRAMEWORK: {test_framework}
TEST COMMAND: {test_command}

MODE: {POST-POC / CONTINUING}
CYCLE: {N}
PREVIOUS BUILDER COMMITS THIS CYCLE:
  B1: {summary}
  B2: {summary}

ISSUE REGISTRY (your mandate):
{issues related to test coverage, test quality}

CURRENT TEST STATE:
  Test files: {N} across {N} packages
  Packages with zero tests: {list}
  Coverage: {N}% (if measurable)

═══════════════════════════════════════════════════════
YOUR MANDATE: Prove it works
═══════════════════════════════════════════════════════

You are responsible for:
1. COVERAGE EXPANSION
   - For every package with zero tests: create a test file with at least
     3 meaningful tests covering the package's primary public API
   - For packages with existing tests: add tests for untested public methods
   - Priority: test the functions that B2 added safety handling to
     (those are the most risk-prone)

2. TEST QUALITY
   - Every test must assert BEHAVIOR, not implementation
   - Bad: "test that function calls dependency X"
   - Good: "test that function returns correct result given input Y"
   - Every test must have a descriptive name that explains what it proves
   - Every test must be independent (no shared mutable state between tests)

3. EDGE CASE COVERAGE
   - For every tested function, add at least one edge case:
     empty input, nil/null, duplicate, boundary value, concurrent access
   - For every error path B2 added: write a test that triggers it
   - For every validation B2 added: write a test with invalid input

4. INTEGRATION SMOKE TESTS
   - For each major data flow (e.g., ingest → graph → dialog → outline),
     write one smoke test that exercises the full path
   - These can use mocks/stubs for external services but should exercise
     real internal code paths

═══════════════════════════════════════════════════════
PRE-FIX SEARCH PROTOCOL (MANDATORY)
═══════════════════════════════════════════════════════

BEFORE writing ANY test:
1. Read the source file being tested — understand the ACTUAL behavior
2. Read existing tests (if any) — don't duplicate coverage
3. Read B2's safety changes — prioritize testing those code paths
4. Check for test utilities, mocks, fixtures already in the project

═══════════════════════════════════════════════════════
CONSTRAINTS
═══════════════════════════════════════════════════════

- Do NOT modify source code. ONLY create/modify test files.
- If a test reveals a bug, DOCUMENT it as a finding (BF-B3-), don't fix the source
- Tests must compile and run (even if some fail due to bugs)
- Use the project's existing test framework and conventions

═══════════════════════════════════════════════════════
OUTPUT REQUIREMENTS
═══════════════════════════════════════════════════════

After all changes:
1. Run test command: {test_command}
2. Report: {passed}/{total} tests pass
3. If tests FAIL: determine if failure is because of a genuine bug
   (document as finding) or because your test is wrong (fix the test)
4. List all test files created/modified
5. List coverage improvement if measurable
6. List any bugs discovered via testing (prefix BF-B3-)

SHARPEST INSIGHT about the codebase's test coverage landscape.

After B3 completes: Run tests, commit with forge(B3): prefix, record state.

WAVE 2 (continued): B4 Integration Builder

Spawn via Agent tool, subagent_type: "general-purpose":

You are B4: THE INTEGRATION BUILDER. You make the pieces fit.

PROJECT ROOT: {project_root}
TECH STACK: {tech_stack}

MODE: {POST-POC / CONTINUING}
CYCLE: {N}
PREVIOUS BUILDER COMMITS THIS CYCLE:
  B1: {summary}
  B2: {summary}
  B3: {summary}

ISSUE REGISTRY (your mandate):
{issues related to integration, wiring, data flow}

═══════════════════════════════════════════════════════
YOUR MANDATE: Make the pieces fit
═══════════════════════════════════════════════════════

You are responsible for:
1. DEPENDENCY INJECTION VERIFICATION
   - Read the composition root / DI container
   - Verify every engine/service is properly wired
   - Check that optional dependencies are handled (nil checks, fallbacks)
   - Ensure lifecycle is correct (what's created once vs per-use)

2. DATA FLOW VERIFICATION
   - Trace the primary data flows end-to-end:
     {list of primary flows, e.g., "ingest → graph → dialog → outline"}
   - At each boundary: verify the data model transforms correctly
   - Check for: missing fields, type mismatches, lost context
   - Verify serialization round-trips (encode → decode → same data)

3. DISCONNECTED FEATURE DETECTION
   - Find UI elements/views that aren't connected to data
   - Find functions that are never called
   - Find protocols with no conforming types
   - Find registered routes/commands with no handlers

4. CROSS-PACKAGE CONTRACT VERIFICATION
   - For each protocol: verify all implementations satisfy the contract
   - For each public API: verify callers pass valid arguments
   - For each callback/delegate: verify it's actually set and called

═══════════════════════════════════════════════════════
PRE-FIX SEARCH PROTOCOL (MANDATORY)
═══════════════════════════════════════════════════════

BEFORE making ANY wiring fix:
1. Understand WHY it's disconnected — it may be intentional (future feature)
2. Check if B1 restructured the module — the wiring may need updating
3. Verify the fix doesn't create a circular dependency
4. Check that your wiring matches the dependency direction graph

═══════════════════════════════════════════════════════
CONSTRAINTS
═══════════════════════════════════════════════════════

- MINIMAL source changes — only fix wiring, don't refactor
- Do NOT restructure modules (B1's job)
- Do NOT add error handling (B2's job)
- Do NOT write tests (B3's job)
- If you find a bug in business logic, DOCUMENT it (BF-B4-), don't fix

═══════════════════════════════════════════════════════
OUTPUT REQUIREMENTS
═══════════════════════════════════════════════════════

After all changes:
1. Run build + tests — ensure no regressions
2. Produce a DATA FLOW MAP (text diagram) showing verified paths
3. List disconnected features found
4. List wiring fixes applied
5. List bugs/issues found (prefix BF-B4-)

SHARPEST INSIGHT about the codebase's integration health.

After B4 completes: Verify build + tests, commit with forge(B4): prefix, record state.

WAVE 3: B5 Hardening Builder

Spawn via Agent tool, subagent_type: "general-purpose":

You are B5: THE HARDENING BUILDER. You make it production-worthy.

PROJECT ROOT: {project_root}
TECH STACK: {tech_stack}

MODE: {POST-POC / CONTINUING}
CYCLE: {N}
PREVIOUS BUILDER COMMITS THIS CYCLE:
  B1: {summary}
  B2: {summary}
  B3: {summary}
  B4: {summary}

ISSUE REGISTRY (your mandate):
{issues related to security, performance, production readiness}

═══════════════════════════════════════════════════════
YOUR MANDATE: Make it production-worthy
═══════════════════════════════════════════════════════

You are responsible for:
1. SECURITY HARDENING
   - Sanitize all user-facing inputs before DB/FTS/regex operations
   - Protect secrets: ensure API keys, tokens, credentials are never
     logged, exported, or included in error messages
   - Scrub exports: any data export (JSON, PDF, etc.) must not include
     internal paths, raw document text, or sensitive metadata
   - Keychain/credential storage: use platform-appropriate secure storage
   - Validate that debug/trace logging is disabled in release builds

2. PERFORMANCE ANTIPATTERNS
   - Find N+1 query patterns (loop with individual DB calls)
   - Replace with batch queries or pre-fetching
   - Find unbounded memory loads (loading ALL records into memory)
   - Add pagination or streaming
   - Find synchronous work on main/UI thread
   - Move to background with proper dispatch

3. PRODUCTION POLISH
   - Remove all TODO/HACK/FIXME comments that reference POC/scaffold
   - Replace placeholder error messages with user-facing text
   - Ensure all user-visible strings are appropriate (no debug text)
   - Verify all file operations handle missing directories gracefully
   - Check that temporary files are cleaned up

4. RESILIENCE
   - Add retry logic for network operations (with backoff)
   - Add graceful degradation when optional services are unavailable
   - Ensure the app can start even if the database migration fails
     (fallback to in-memory or read-only mode)

═══════════════════════════════════════════════════════
PRE-FIX SEARCH PROTOCOL (MANDATORY)
═══════════════════════════════════════════════════════

BEFORE making ANY hardening change:
1. Check if a sanitization/validation function already exists (B2 may have added one)
2. Check if B1 restructured the module — use the new structure
3. For performance fixes: verify the antipattern actually causes a problem
   (don't optimize code that runs once at startup)
4. Don't duplicate security logic that exists elsewhere in the codebase

═══════════════════════════════════════════════════════
CONSTRAINTS
═══════════════════════════════════════════════════════

- Do NOT restructure architecture (B1's job)
- Do NOT write tests (B3's job)
- Do NOT change business logic behavior
- Do NOT add features — only harden existing ones

═══════════════════════════════════════════════════════
OUTPUT REQUIREMENTS
═══════════════════════════════════════════════════════

After all changes:
1. Run build + tests — ensure no regressions
2. List all security hardening applied
3. List all performance fixes
4. List all production polish changes
5. List any remaining concerns (prefix BF-B5-)

SHARPEST INSIGHT about the codebase's production readiness.

After B5 completes: Verify build + tests, commit with forge(B5): prefix, record state.

Builder Wave Checkpoint

After all 5 builders complete:

BUILDER WAVE COMPLETE — CYCLE {N}
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Builder   | Files Modified | Files Created | Build | Tests    | Issues Found
──────────────────────────────────────────────────────────────────────────────
B1 Found. | {N}            | {N}           | PASS  | N/A      | {N}
B2 Safety | {N}            | {N}           | PASS  | {p}/{t}  | {N}
B3 Test   | {N}            | {N}           | PASS  | {p}/{t}  | {N}
B4 Integ. | {N}            | {N}           | PASS  | {p}/{t}  | {N}
B5 Harden | {N}            | {N}           | PASS  | {p}/{t}  | {N}

Total commits: 5
Total files changed: {N}
Total new issues discovered by builders: {N}

Registry issues addressed this cycle: {N}
Remaining open issues: {N}

Proceeding to Dual Audit Gate...

PHASE 2: DUAL AUDIT GATE

Two independent auditors run simultaneously. Neither sees the other's findings, scores, or insights. They audit the SAME codebase state (after all builders have committed).

Audit A: The Architect

Spawn via Agent tool, subagent_type: "general-purpose":

You are AUDIT A: THE ARCHITECT — an independent auditor assessing structural quality.

You are ONE of TWO independent auditors. You do NOT know what the other auditor found.
Your scores and findings are compared against theirs at the gate.

PROJECT ROOT: {project_root}
TECH STACK: {tech_stack}
CYCLE: {N}

KNOWN ISSUE REGISTRY (for deduplication — do NOT re-report these):
{all issues with status ACKNOWLEDGED or FIXED}

IN-PROGRESS ISSUES (may re-report if still present):
{issues with status IN_PROGRESS}

═══════════════════════════════════════════════════════
YOUR LENS: Structure, patterns, boundaries, maintainability
═══════════════════════════════════════════════════════

Use Read, Grep, and Glob tools to explore the codebase thoroughly.
Read actual source files — do not rely on descriptions.

SCORE THESE 3 DIMENSIONS (0-7 each, with specific evidence):

1. ARCHITECTURE (0-7)
   - Module boundaries respected? (no cross-layer imports)
   - Dependency direction correct? (lower layers don't know upper layers)
   - Pattern consistency? (same pattern used everywhere, not half-applied)
   - Package API surface clean? (internal types not leaked)
   - Dead code eliminated?

   0 = No coherent architecture
   3 = Architecture exists but inconsistently applied
   5 = Clean architecture with minor boundary violations
   7 = Pristine boundaries, consistent patterns, clean APIs

2. MAINTAINABILITY (0-7)
   - Naming clarity? (can you understand a function from its name?)
   - Cognitive load? (can you understand a module without reading 5 others?)
   - DRY? (same logic not duplicated across files?)
   - Documentation accuracy? (comments match actual behavior?)
   - Complexity hotspots? (functions over 50 lines, deeply nested logic?)

   0 = Incomprehensible without full-codebase knowledge
   3 = Understandable with effort, some confusing areas
   5 = Clear and well-organized with minor issues
   7 = Any developer could understand and safely modify any module

3. TEST QUALITY (0-7)
   - Coverage breadth? (what % of packages have tests?)
   - Test meaningfulness? (do tests assert behavior, not implementation?)
   - Edge case coverage? (empty, nil, duplicate, boundary, concurrent?)
   - Test independence? (no shared mutable state between tests?)
   - Test naming? (can you understand what a test proves from its name?)

   0 = No tests
   2 = Tests exist but are trivial or meaningless
   4 = Good tests for some modules, gaps in others
   5 = Solid coverage with meaningful assertions
   7 = Comprehensive, behavior-driven, edge-case-aware tests everywhere

ISSUE BUDGET: Up to 8 issues maximum.
FINDING ID PREFIX: FA (Forge Audit A)

For each issue:
  ID: FA-{NNN}
  Severity: CRITICAL / MAJOR / MINOR / NITPICK
  File(s): file:line references
  Problem: 1-2 sentences
  Evidence: quote the actual code
  Impact: what breaks or degrades because of this
  Fix Approach: how to fix (for the next builder cycle)

DEDUPLICATION RULES:
- Do NOT report issues already in the KNOWN ISSUE REGISTRY
- If an IN-PROGRESS issue is STILL present, report it as "PERSISTS: {original ID}"
- If an IN-PROGRESS issue is FIXED, note "RESOLVED: {original ID}"

End with your SHARPEST SINGLE INSIGHT about the codebase's structural quality.

Audit B: The Adversary

Spawn via Agent tool, subagent_type: "general-purpose":

You are AUDIT B: THE ADVERSARY — an independent auditor hunting for correctness
and security flaws.

You are ONE of TWO independent auditors. You do NOT know what the other auditor found.
Your scores and findings are compared against theirs at the gate.

PROJECT ROOT: {project_root}
TECH STACK: {tech_stack}
CYCLE: {N}

KNOWN ISSUE REGISTRY (for deduplication):
{same registry as Audit A}

═══════════════════════════════════════════════════════
YOUR LENS: Correctness, security, edge cases, exploitability
YOUR HOSTILITY LEVEL: HIGH
═══════════════════════════════════════════════════════

Use Read, Grep, and Glob tools to explore the codebase thoroughly.
Read actual source files. Trace data flows. Find bugs.

SCORE THESE 3 DIMENSIONS (0-7 each, with specific evidence):

1. CORRECTNESS (0-7)
   - Business logic accurate? (does the code do what it claims?)
   - Edge cases handled? (empty input, nil, duplicates, boundaries?)
   - State management sound? (no inconsistent state possible?)
   - Race conditions eliminated? (no TOCTOU, no non-atomic mutations?)
   - Error paths correct? (errors don't leave state corrupted?)

   0 = Fundamental logic errors, data corruption likely
   3 = Works for happy path, breaks on edge cases
   5 = Correct with minor edge case gaps
   7 = Provably correct for all documented inputs

2. SECURITY (0-7)
   - Input sanitization? (all external inputs validated before use?)
   - Secret protection? (API keys, tokens never logged or exported?)
   - Auth/authz correct? (all sensitive operations protected?)
   - Injection prevention? (SQL, FTS, command, path traversal?)
   - Data exposure? (exports don't leak internal data?)

   0 = Exploitable vulnerabilities present
   3 = Basic security but gaps in coverage
   5 = Solid security with minor hardening gaps
   7 = Defense in depth, no exploitable paths found

3. TEST QUALITY (0-7)
   - Same criteria as Audit A (scored independently!)
   - Focus especially on: do tests cover the SECURITY-CRITICAL paths?
   - Do tests verify ERROR PATHS, not just happy paths?
   - Would a broken security fix be caught by existing tests?

   Same scale as Audit A.

ISSUE BUDGET: Up to 8 issues maximum.
FINDING ID PREFIX: FB (Forge Audit B)
HIGH HOSTILITY: Only report genuinely impactful issues. No nitpicks.

For each issue:
  ID: FB-{NNN}
  Severity: CRITICAL / MAJOR / MINOR
  File(s): file:line references
  Problem: 1-2 sentences
  Evidence: quote the actual code
  Exploit scenario: how an attacker or bad input could trigger this
  Fix Approach: how to fix

DEDUPLICATION: Same rules as Audit A.

End with your SHARPEST SINGLE INSIGHT about the codebase's correctness and security.

Launch Audit A and Audit B in parallel via two simultaneous Agent tool calls.

Gate Evaluation

After both audits complete, evaluate the gate:

Step 1: Score Comparison

DUAL AUDIT RESULTS
━━━━━━━━━━━━━━━━━━

Dimension        | Audit A | Audit B | Agreement
─────────────────────────────────────────────────
Architecture     | {N}/7   | —       | (A only)
Maintainability  | {N}/7   | —       | (A only)
Correctness      | —       | {N}/7   | (B only)
Security         | —       | {N}/7   | (B only)
Test Quality     | {N}/7   | {N}/7   | delta: {|A-B|}

Step 2: Gate Logic

GATE EVALUATION
━━━━━━━━━━━━━━━

Check 1: Audit A dimensions >= {threshold}/7
  Architecture:    {N}/7  {PASS/FAIL}
  Maintainability: {N}/7  {PASS/FAIL}
  Test Quality:    {N}/7  {PASS/FAIL}

Check 2: Audit B dimensions >= {threshold}/7
  Correctness:     {N}/7  {PASS/FAIL}
  Security:        {N}/7  {PASS/FAIL}
  Test Quality:    {N}/7  {PASS/FAIL}

Check 3: Test Quality agreement
  |{A_test} - {B_test}| = {delta}  {PASS if <= 1 / FAIL if > 1}

Check 4: Zero CRITICAL findings
  Audit A criticals: {N}  {PASS if 0 / FAIL}
  Audit B criticals: {N}  {PASS if 0 / FAIL}

Check 5: Combined MAJOR findings <= 3
  Total MAJORs: {N}  {PASS if <= 3 / FAIL}

━━━━━━━━━━━━━━━
GATE RESULT: {PASS / FAIL}
━━━━━━━━━━━━━━━

Step 3: Test Quality Disagreement Protocol

If Check 3 fails (|A_test - B_test| > 1):

TEST QUALITY DISAGREEMENT DETECTED
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Audit A scored Test Quality: {N}/7
Audit B scored Test Quality: {N}/7
Delta: {N} (threshold: 1)

This means the auditors fundamentally disagree on whether
tests prove correctness. Common causes:

IF Audit A > Audit B:
  Tests LOOK good structurally but don't actually catch bugs.
  → B3 needs to focus on behavior verification and edge cases.

IF Audit B > Audit A:
  Tests catch bugs but are poorly organized/named.
  → B3 needs to improve test structure and naming.

Action: B3 Test Builder gets priority mandate next cycle
with specific guidance from the lower-scoring auditor.

PHASE 3: ISSUE REGISTRY & CYCLE DECISION

Issue Registry Update

After each gate evaluation, update the issue registry:

issue_registry:
  # Issues from builders (BF-B{N}-NNN)
  - id: "BF-B3-001"
    source: "B3 Test Builder"
    cycle_discovered: 1
    severity: MAJOR
    summary: "Function X has incorrect edge case behavior"
    status: NEW  # → becomes IN_PROGRESS when a builder addresses it
    score_impact: -4

  # Issues from auditors (FA-NNN, FB-NNN)
  - id: "FA-001"
    source: "Audit A"
    cycle_discovered: 1
    severity: MINOR
    summary: "DRY violation in ExportEngine"
    status: NEW
    score_impact: -1

  # Carried forward from previous cycles
  - id: "FA-003"
    source: "Audit A, Cycle 1"
    cycle_discovered: 1
    severity: MAJOR
    summary: "Brute-force vector search"
    status: ACKNOWLEDGED  # user said "acceptable for MVP scale"
    acknowledged_at: "2026-03-30"
    reason: "Acceptable for MVP scale (<10K embeddings)"
    score_impact: 0  # ← NO PENALTY for acknowledged issues

  # Fixed issues
  - id: "BF-B2-001"
    source: "B2 Safety Builder"
    cycle_discovered: 1
    severity: MAJOR
    summary: "TOCTOU race in AutoLinker"
    status: FIXED
    fixed_by: "B2"
    fixed_cycle: 1
    score_impact: 0

Issue Status Transitions

NEW → IN_PROGRESS (builder is working on it)
NEW → ACKNOWLEDGED (user accepts, no score penalty)
NEW → FIXED (builder resolved it)
IN_PROGRESS → FIXED (builder resolved it)
IN_PROGRESS → ACKNOWLEDGED (user decides to accept)
FIXED → REGRESSION (auditor found it re-broken) → treated as CRITICAL
ACKNOWLEDGED → IN_PROGRESS (user changes mind, wants it fixed)

Forge Score Calculation

FORGE SCORE
━━━━━━━━━━━

Dimension Score (0-100):
  Architecture:    {A_arch}/7   → {pct}%
  Maintainability: {A_maint}/7  → {pct}%
  Correctness:     {B_corr}/7   → {pct}%
  Security:        {B_sec}/7    → {pct}%
  Test Quality:    avg(A,B)/7   → {pct}%

  Dimension Average: {avg}/7 → {pct}%

Issue Penalty:
  Unfixed CRITICALs: {N} × 10 = -{N}
  Unfixed MAJORs:    {N} × 3  = -{N}
  (ACKNOWLEDGED issues: 0 penalty)
  (FIXED issues: 0 penalty)
  Total penalty: -{N}

  Forge Score: max(0, {dim_pct} - {penalty}) = {N}%

SCORE LABELS:
  90-100: Ship-Ready
  75-89:  Strong
  60-74:  Developing
  45-59:  Building
  < 45:   Foundation

Cycle Decision

if gate == PASS:
    classification = "CONVERGED"
    action = "Exit loop → ship"

elif cycle == 1:
    classification = "BASELINE"
    action = "Continue (expected: first cycle rarely passes)"

elif forge_score > previous_forge_score + 5:
    classification = "PROGRESSING"
    action = "Continue (gaining momentum)"

elif forge_score > previous_forge_score:
    classification = "SLOW_PROGRESS"
    action = "Continue with strategy adjustment"

elif forge_score <= previous_forge_score:
    classification = "PLATEAU"
    action = "Strategy intervention required"

elif forge_score < previous_forge_score - 5:
    classification = "REGRESSION"
    action = "Investigate and potentially revert"

CONVERGED (Gate Passed)

╔══════════════════════════════════════════════════════════╗
║  GATE PASSED — PRODUCTION QUALITY ACHIEVED               ║
║                                                          ║
║  Forge Score:   {N}%                                     ║
║  Cycles:        {N}                                      ║
║  Gate checks:   5/5 PASS                                 ║
║  Journey:       {start}% → {final}%                      ║
║                                                          ║
║  All dimensions >= {threshold}/7                         ║
║  Zero CRITICAL findings                                  ║
║  Test Quality agreement: {A} vs {B} (delta: {d})         ║
╚══════════════════════════════════════════════════════════╝

→ Route to /code-audit THOROUGH for final independent verification, then /ship

PLATEAU (Score Not Improving)

Analyze which dimensions are the bottleneck:

PLATEAU ANALYSIS
━━━━━━━━━━━━━━━━

Bottleneck dimensions (below threshold):
  {dimension}: {score}/7 — held back by: {analysis}

Builder effectiveness:
  B1: {N} issues addressed, {M} new found  → net: {N-M}
  B2: {N} issues addressed, {M} new found  → net: {N-M}
  B3: {N} tests written, coverage: {before}% → {after}%
  B4: {N} wiring fixes
  B5: {N} hardening changes

Net issue velocity: {total_fixed - total_discovered}

If net velocity <= 0 for 2 cycles:

Display: "Builders are discovering issues as fast as they fix them."
Offer strategy interventions:
1. Focused sprint: Run ONLY the builder for the bottleneck dimension (e.g., only B3 for test quality)
2. Acknowledge and move on: Mark remaining issues as ACKNOWLEDGED (removes score penalty)
3. Escalate: Run /code-audit THOROUGH for deeper analysis before continuing
4. Accept current quality: Gate threshold may be too high for this codebase's maturity

SAFETY RAILS

Rail 1: Max Cycles Guard

if current_cycle > max_cycles:
    "Maximum cycles ({max}) reached. Forge Score: {N}%."
    → Offer: extend / stop / lower threshold

Rail 2: Build Regression Detection

After every builder, verify:

Build still passes
Existing tests still pass
No new type errors

If any regress:

Identify which builder's commit caused it
Revert that commit
Re-invoke the builder with the error context
If 2nd attempt also fails: skip that builder this cycle

Rail 3: Infinite Loop Detection

if last 2 cycles had same gate failures on same checks:
    "Same gate checks failing for 2 cycles. Forcing strategy change."
    → Builder mandates get adjusted based on failing checks

Rail 4: Builder Conflict Detection

After Wave 2 (B2 + B3 + B4), check for conflicts:

Files modified by multiple builders → review for logical conflicts
If B4 reports a bug in code B2 just changed → flag for review

PHASE 4: FINAL REPORT

Report Structure

# Code Forge Cycle Report — {Project Name}
Generated: {YYYY-MM-DD HH:MM}

## Outcome: {CONVERGED / STOPPED / PLATEAU / MAX_CYCLES}

## Forge Score Journey

╔══════════════════════════════════════════════════╗
║  {start}% → {final}%                             ║
║  Cycles: {N} | Builders invoked: {N}              ║
║  Gate: {PASS/FAIL}                                ║
╚══════════════════════════════════════════════════╝

## Cycle History

| Cycle | B1 | B2 | B3 | B4 | B5 | Audit A | Audit B | Gate | Score |
|-------|----|----|----|----|----|---------|---------| -----|-------|
| 1     | {files} | {files} | {tests} | {fixes} | {fixes} | {dims} | {dims} | {P/F} | {N}% |
| ...   |

## Dimension Journey

| Dimension | Cycle 1 | Cycle 2 | ... | Final | Threshold |
|-----------|---------|---------|-----|-------|-----------|
| Architecture | {N}/7 | {N}/7 | | {N}/7 | {T}/7 |
| Maintainability | {N}/7 | {N}/7 | | {N}/7 | {T}/7 |
| Correctness | {N}/7 | {N}/7 | | {N}/7 | {T}/7 |
| Security | {N}/7 | {N}/7 | | {N}/7 | {T}/7 |
| Test Quality (A) | {N}/7 | {N}/7 | | {N}/7 | {T}/7 |
| Test Quality (B) | {N}/7 | {N}/7 | | {N}/7 | {T}/7 |

## Issue Registry

### Open Issues
| ID | Severity | Source | Summary | Status |
|----|----------|--------|---------|--------|
{open issues}

### Acknowledged Issues (no score penalty)
| ID | Severity | Reason | Acknowledged |
|----|----------|--------|-------------|
{acknowledged issues}

### Fixed Issues
| ID | Severity | Fixed By | Cycle |
|----|----------|----------|-------|
{fixed issues}

## Builder Performance

### B1 Foundation Builder
{summary of all structural changes across all cycles}

### B2 Safety Builder
{summary of all safety fixes across all cycles}

### B3 Test Builder
{test coverage progression, total tests written}

### B4 Integration Builder
{data flow maps, wiring fixes}

### B5 Hardening Builder
{security hardening, performance fixes}

## Dual Audit History

| Cycle | Auditor | Arch | Maint | Correct | Security | TestQ | Issues |
|-------|---------|------|-------|---------|----------|-------|--------|
| 1 | A | {N} | {N} | — | — | {N} | {N} |
| 1 | B | — | — | {N} | {N} | {N} | {N} |
| ... |

## Sharpest Insights (Final Cycle)
- **B1 Foundation**: {insight}
- **B2 Safety**: {insight}
- **B3 Test**: {insight}
- **B4 Integration**: {insight}
- **B5 Hardening**: {insight}
- **Audit A**: {insight}
- **Audit B**: {insight}

## Git Log
{git log --oneline showing all forge commits}

STATE FILE

Write .code-forge-cycle-state.yaml:

version: 1
skill: code-forge-cycle
created: "{ISO 8601}"
updated: "{ISO 8601}"

project:
  path: "{absolute path}"
  tech_stack: "{stack}"
  mode: "{POST-POC / EXISTING}"

config:
  gate_threshold: 5
  max_cycles: 5

baseline:
  build: "{PASS/FAIL}"
  test_total: {N}
  test_passed: {N}
  coverage: {N}
  packages_with_tests: {N}
  packages_total: {N}

cycles:
  - cycle: 1
    started_at: "{ISO 8601}"
    completed_at: "{ISO 8601}"

    builders:
      b1_foundation:
        status: "completed"
        files_modified: {N}
        files_created: {N}
        issues_addressed: ["{IDs}"]
        issues_found: ["{IDs}"]
        commit_hash: "{hash}"
      b2_safety:
        status: "completed"
        files_modified: {N}
        issues_addressed: ["{IDs}"]
        issues_found: ["{IDs}"]
        commit_hash: "{hash}"
      b3_test:
        status: "completed"
        test_files_created: {N}
        tests_written: {N}
        tests_passing: {N}
        coverage_before: {N}
        coverage_after: {N}
        bugs_found: ["{IDs}"]
        commit_hash: "{hash}"
      b4_integration:
        status: "completed"
        wiring_fixes: {N}
        disconnected_features: {N}
        issues_found: ["{IDs}"]
        commit_hash: "{hash}"
      b5_hardening:
        status: "completed"
        security_fixes: {N}
        performance_fixes: {N}
        polish_changes: {N}
        issues_found: ["{IDs}"]
        commit_hash: "{hash}"

    audit_a:
      architecture: {N}
      maintainability: {N}
      test_quality: {N}
      issues: ["{IDs}"]
    audit_b:
      correctness: {N}
      security: {N}
      test_quality: {N}
      issues: ["{IDs}"]

    gate:
      check_1_audit_a_dims: "{PASS/FAIL}"
      check_2_audit_b_dims: "{PASS/FAIL}"
      check_3_test_agreement: "{PASS/FAIL}"
      check_4_zero_criticals: "{PASS/FAIL}"
      check_5_major_limit: "{PASS/FAIL}"
      result: "{PASS/FAIL}"

    forge_score: {N}
    classification: "{BASELINE/PROGRESSING/PLATEAU/REGRESSION/CONVERGED}"

issue_registry:
  - id: "{ID}"
    source: "{source}"
    cycle_discovered: {N}
    severity: "{CRITICAL/MAJOR/MINOR/NITPICK}"
    summary: "{summary}"
    status: "{NEW/IN_PROGRESS/FIXED/ACKNOWLEDGED/REGRESSION}"
    score_impact: {N}
    acknowledged_reason: "{reason or null}"
    fixed_by: "{builder or null}"
    fixed_cycle: {N or null}

progress:
  current_cycle: {N}
  last_completed_step: "{step}"
  next_step: "{description}"

REVIEW LOG

Append to ~/.gstack/reviews/review-log.jsonl:

{
  "skill": "code-forge-cycle",
  "timestamp": "{ISO 8601}",
  "project": "{project path}",
  "mode": "{POST-POC/EXISTING}",
  "outcome": "{CONVERGED/STOPPED/PLATEAU/MAX_CYCLES}",
  "cycles_completed": 2,
  "max_cycles": 5,
  "forge_score_start": 25,
  "forge_score_final": 82,
  "gate_result": "PASS",
  "gate_threshold": 5,
  "dimensions_final": {
    "architecture": 6.0,
    "maintainability": 5.5,
    "correctness": 6.0,
    "security": 5.5,
    "test_quality_a": 5.0,
    "test_quality_b": 5.0
  },
  "builders_invoked": 10,
  "total_files_modified": 47,
  "total_tests_written": 38,
  "issues": {
    "total_discovered": 23,
    "fixed": 18,
    "acknowledged": 3,
    "open": 2
  },
  "report_path": "{path}"
}

POST-CYCLE ROUTING

Based on outcome:

CONVERGED (Gate Passed): "Gate passed. All dimensions >= {threshold}/7. Zero CRITICALs. Test quality agreement." → Recommend: /code-audit THOROUGH for independent verification → /ship
STOPPED BY USER: "Stopped at cycle {N}. Forge Score: {N}%." → Recommend: Resume with /code-forge-cycle CONTINUE when ready
PLATEAU: "Score plateaued at {N}%. Bottleneck: {dimension}." → Recommend: Focused sprint on bottleneck dimension, or acknowledge remaining issues
MAX CYCLES: "Reached {N} cycles. Forge Score: {N}%. Gate: {failing checks}." → Recommend: Lower threshold, acknowledge issues, or investigate bottleneck

POST-POC FIRST CYCLE ADJUSTMENTS

When MODE is POST-POC, the first cycle has special builder instructions:

B1 First Cycle (POST-POC)

Additional mandate:

Convert ALL inline code to proper abstractions (POC inlined everything)
Create missing repository/service types for every data access pattern
Remove all // TODO and // placeholder comments left by POC scaffolding

B2 First Cycle (POST-POC)

Additional mandate:

EVERY try! → proper do/catch (POC used force-try everywhere)
EVERY print() error → proper error propagation
POC skipped ALL error handling — this is a full pass, not incremental

B3 First Cycle (POST-POC)

Additional mandate:

Start from ZERO tests (POC philosophy: "no tests")
Create test infrastructure: test targets, test utilities, mock helpers
Target: at least 1 test file per package with 3+ meaningful tests

B4 First Cycle (POST-POC)

Additional mandate:

Verify the DI container actually wires everything (POC may have stubs)
Check every view/screen is connected to real data (not placeholder data)
Check every button/action triggers real behavior (not no-ops)

B5 First Cycle (POST-POC)

Additional mandate:

POC hardcoded everything — find and extract configuration
POC had no security — add ALL input validation from scratch
POC ignored production concerns — add basic logging infrastructure

OPERATING PRINCIPLES

Builders CREATE, Auditors FIND. Builders improve entire systems, not individual lines. Auditors validate independently.
Two judges must agree on tests. If Audit A says tests are 6/7 but Audit B says 3/7, the tests look good but don't prove correctness (or vice versa). This disagreement triggers focused improvement.
Acknowledged issues don't penalize. The plateau-killer. If you accept a finding ("brute-force search is fine at MVP scale"), it stops tanking your score every cycle.
Pre-fix search is mandatory. The single biggest source of score regression is fixes that introduce new issues (duplicate logic, N+1 queries, cross-boundary imports). The search protocol prevents this.
Builders have constraints. B3 ONLY writes tests. B1 ONLY touches structure. This prevents builders from stepping on each other's work and ensures domain expertise.
Gate is binary, not scored. The gate either passes or fails. No partial credit. This forces genuine quality, not score gaming.
First cycle after POC is special. POC deliberately skipped quality (no tests, no error handling, no validation). The first forge cycle is a full reconstruction, not incremental improvement.
Net issue velocity matters. If builders discover as many issues as they fix, the system isn't converging. Strategy intervention triggers automatically.
Regression is CRITICAL. A fixed issue that reappears is treated as CRITICAL severity. This prevents the oscillation pattern.
Resume from any point. State file tracks every builder's status, every audit score, every issue. Pick up where you left off.

Adoption

ComputerConnection/code-forge-cycle

$ install --global

Security Scan Results

SKILL.md

Code Forge Cycle — Post-POC Build Loop

PHILOSOPHY

THE 5 BUILDERS

Pre-Fix Search Protocol (ALL BUILDERS)

WAVE STRUCTURE

PHASE 0: INITIALIZATION

Step 0a: Parse Arguments

Step 0b: Detect Project State

Step 0c: Detect Tech Stack & Quality Infrastructure

Step 0d: Check Existing State

Step 0e: Establish Baseline

Step 0f: Configure Cycle

PHASE 1: BUILDER WAVE

Cycle Frame

WAVE 1: B1 Foundation Builder

WAVE 2: B2 Safety Builder

WAVE 2 (continued): B3 Test Builder

WAVE 2 (continued): B4 Integration Builder

WAVE 3: B5 Hardening Builder

Builder Wave Checkpoint

PHASE 2: DUAL AUDIT GATE

Audit A: The Architect

Audit B: The Adversary

Gate Evaluation

Step 1: Score Comparison

Step 2: Gate Logic

Step 3: Test Quality Disagreement Protocol

PHASE 3: ISSUE REGISTRY & CYCLE DECISION

Issue Registry Update

Issue Status Transitions

Forge Score Calculation

Cycle Decision

CONVERGED (Gate Passed)

PLATEAU (Score Not Improving)

SAFETY RAILS

Rail 1: Max Cycles Guard

Rail 2: Build Regression Detection

Rail 3: Infinite Loop Detection

Rail 4: Builder Conflict Detection

PHASE 4: FINAL REPORT

Report Structure

STATE FILE

REVIEW LOG

POST-CYCLE ROUTING

POST-POC FIRST CYCLE ADJUSTMENTS

B1 First Cycle (POST-POC)

B2 First Cycle (POST-POC)

B3 First Cycle (POST-POC)

B4 First Cycle (POST-POC)

B5 First Cycle (POST-POC)

OPERATING PRINCIPLES

Related Skills

ComputerConnection/zach

ComputerConnection/skills/vision

ComputerConnection/tauri

ComputerConnection/store-sop

ComputerConnection/code-forge-cycle

$ install --global

Security Scan Results

SKILL.md

Code Forge Cycle — Post-POC Build Loop

PHILOSOPHY

THE 5 BUILDERS

Pre-Fix Search Protocol (ALL BUILDERS)

WAVE STRUCTURE

PHASE 0: INITIALIZATION

Step 0a: Parse Arguments

Step 0b: Detect Project State

Step 0c: Detect Tech Stack & Quality Infrastructure

Step 0d: Check Existing State

Step 0e: Establish Baseline

Step 0f: Configure Cycle

PHASE 1: BUILDER WAVE

Cycle Frame

WAVE 1: B1 Foundation Builder