Plan-Delivery Audit

Spec backlink: docs/plans/2026-05-28-archive-aware-review-oracle-and-audit-skill.md § Chunk C5. Origin: Distilled from the 2026-05-27 holodeck audit (X:\claude-unreal-holodeck\tasks\recovery\2026-05-27-plan-delivery-audit.md) where a live-only review-trail read produced a false "most work unreviewed" alarm — 22 archived records had been moved by /workweek-complete Step 13 and were invisible to the live glob.

When to invoke

After any session crash or partition that puts shipping state in doubt.
Mid-quarter "what shipped this month?" reconciliation.
Post-merge audit when a chain of handoffs lost a clean stopping point.
PM question: "Did we actually finish what we think we finished?"
Any time a plan's status: implemented self-assertion needs independent verification.

Out-of-scope sidecars

Sidecar files generated by the coordinator review pipeline inherit their parent plan's status: frontmatter and must be excluded from the plan set being audited. Exclude any file matching:

*.prior-art-check.md
*.coverage-check.md
*.docs-check.md
*.review-patrik.md
*.review-sid.md
*.review-camelia.md
Any file whose basename contains .review- or .check.

Apply the exclusion at glob time. Default glob: docs/plans/*.md minus the patterns above. The holodeck audit (2026-05-27) confirmed this exclusion is load-bearing — sidecars were the source of misleading status: reviewed entries in the candidate set before filtering.

The three oracles (read each independently)

Read each oracle independently before comparing. Oracle 1 is the thing under test; Oracles 2 and 3 are the falsifiers. Never let Oracle 1's self-assertion influence your Oracle 2 or 3 read.

Oracle 1 — Plan-claim

The plan's own frontmatter and AC status column. Read:

status: field (e.g. implemented, in-progress, draft, superseded, abandoned)
Any per-AC Status column entries in the plan body
Execution notes, reviewed: fields

Treat these as the plan's claim, not as ground truth. Oracle 1 is the hypothesis; Oracles 2 and 3 either confirm or falsify it. A plan confidently self-reporting status: implemented with all ACs green is a starting point, not a verdict.

Oracle 2 — Code-reality

For each AC whose typed-prefix test can be run mechanically (grep:, cited:, bats:, pytest:, node:), run it against current HEAD. Ignore the AC's own Status note — the test result is the evidence, not the assertion.

grep:<pattern>@<file> → run as grep -n '<pattern>' <file> on disk
cited:<file> → verify the file exists at that path
bats:<file> → run bats <file> (or note if bats unavailable; mark as unverifiable)
pytest:<file> → run poetry run pytest <file> -q (or uv run pytest)
node:<file> → run node <file>

Miss on any grep or cited test = PARTIAL, not delivered, regardless of what Oracle 1 claims. If a plan has no typed-prefix ACs (narrative-only), note that Oracle 2 is unverifiable and treat the plan conservatively.

Dispatch shape: For audits covering ≥4 plans, dispatch parallel read-only Sonnet scouts (one per plan). Each scout runs every typed-prefix test for its assigned plan, returns inline results, EM synthesises. Do NOT modify files, commit, or push during Oracle 2 runs.

Oracle 3 — Review (archive-aware)

Does an independent review-trail record cover the plan's delivery commits?

A commit C is covered by trail record [A..B] if and only if:

  git merge-base --is-ancestor C B   # must succeed (exit 0): C is within the reviewed window (at or before B)
! git merge-base --is-ancestor C A   # must succeed (exit 0): C is NOT before the window start (after A, exclusive)

Both clauses are required: is-ancestor C B AND ! is-ancestor C A. The negative clause prevents a review record from absorbing commits that predate the reviewed window — without it, any record with a sufficiently old start SHA would appear to "cover" any commit in the repo. The ! negation in front of the second git merge-base is load-bearing — omitting it inverts the test from "after the window start" to "before the window start" and silently breaks the audit.

Trail records live at BOTH state/review-trail/**/*.json AND archive/review-trail/**/*.json. Read via:

list-review-trail-records.sh

DO NOT glob state/review-trail/ alone. The /workweek-complete Step 13 archival moves all current-week records into archive/review-trail/<week-starting>/ on every weekly reset. A live-only read systematically under-counts coverage for anything older than the current week. This is the exact failure mode that prompted this skill — the 2026-05-27 holodeck audit found 22 archived records invisible in the live dir, producing a false "most work unreviewed" alarm. See docs/wiki/workstream-complete-review.md § Archive-Aware Glob.

For each trail record returned by list-review-trail-records.sh:

Read the record's sha_range field (format: "<start>..<end>")
For each of the plan's delivery commits C: run the two-clause ancestry test above
If any record covers all delivery commits, Oracle 3 = COVERED; else UNCOVERED

If the plan's delivery commits cannot be identified from frontmatter or execution notes, search git log --oneline for commits that touch files the plan's ACs describe.

Bucket decision tree

Apply this decision tree to each plan after running all three oracles. Every plan resolves into exactly one bucket — no plan should fall into two.

| Oracle 1 (claim) | Oracle 2 (code) | Oracle 3 (review) | Bucket | |------------------|-----------------|-------------------|--------| | status: implemented (or shipped) | All typed-prefix ACs pass at HEAD | ≥1 trail record covers delivery commits | DELIVERED+REVIEWED | | status: implemented (or shipped) | All typed-prefix ACs pass at HEAD | No trail record covers delivery commits | DELIVERED-UNREVIEWED | | status: implemented (or shipped) | ≥1 typed-prefix AC fails at HEAD, OR no typed-prefix ACs (unverifiable) | (any) | PARTIAL | | status: in-progress / draft / reviewed (not yet implemented) | (any) | (any) | IN-FLIGHT | | status: superseded / abandoned / cancelled | (any) | (any) | ABANDONED |

Tie-breaking rules:

A plan with status: implemented but unverifiable Oracle 2 (no typed-prefix ACs) goes into PARTIAL, not DELIVERED. Self-assertion without machine-checkable evidence is not delivery.
A plan with status: draft AND commit evidence of substantial shipped work still goes into IN-FLIGHT — the correct response is to flip the frontmatter, not to reclassify here.
DELIVERED-UNREVIEWED is a real state, not an error. The correct follow-up is to dispatch code-reviewer against the delivery diff and record the trail entry. Do not skip the review or backdoor a "reviewed" claim without running the actual reviewer.
ABANDONED plans found with Oracle 2 evidence of shipped code should surface as a concern in the audit output — the plan may need a frontmatter flip to superseded with a superseded_by: pointer.

Output format

Emit a markdown table — one row per audited plan:

| Plan | Oracle 1 (claim) | Oracle 2 (code) | Oracle 3 (review) | Bucket |
|------|-----------------|-----------------|-------------------|--------|
| `docs/plans/YYYY-MM-DD-name.md` | `status: X`; AC count | N/M ACs pass; misses if any | trail record `A..B` covers commits / UNCOVERED | **BUCKET** |

Follow the table with:

Summary — count per bucket.
Catch-up review queue — list every DELIVERED-UNREVIEWED plan with its delivery commits.
Frontmatter flip recommendations — list any ABANDONED plans with stale status: draft.
Method notes — flag any plan with unverifiable Oracle 2 (no typed-prefix ACs) or missing delivery commits.

Save to state/audits/YYYY-MM-DD-plan-delivery-audit.md. Create the state/audits/ directory if absent.

Worked example — 2026-05-27 holodeck audit

the Staff Engineer F3 falsifiability hook. The rows below are drawn directly from the holodeck audit at X:\claude-unreal-holodeck\tasks\recovery\2026-05-27-plan-delivery-audit.md. A reviewer can walk the decision tree against each row independently and verify the bucket assignment. Plans are in the holodeck repo (X:\claude-unreal-holodeck\docs\plans\).

Plans audited (4 real plans; sidecar exclusion applied first):

| Plan | Oracle 1 (claim) | Oracle 2 (code) | Oracle 3 (review) | Bucket | |------|-----------------|-----------------|-------------------|--------| | 2026-05-24-patch-and-send-back-contribution-invite.md | status: implemented; cites commits d67f7371b, d368b6269, a92447c17, b5b824d47; 10 ACs | 9/10 ACs verified at disk; AC3 (URL liveness) inherently manual — remaining 9 pass grep/cited checks | COVERED (archive-aware). archive/review-trail/2026-05-21/2026-05-24-115430145738-08b5c444.json — range 977a40b29..cc4c936d6; all 4 delivery commits satisfy is-ancestor C cc4c936d6 && ! is-ancestor C 977a40b29. A live-only Oracle 3 (state/review-trail/ glob without archive/review-trail/**) would mis-classify this plan as DELIVERED-UNREVIEWED — see Key finding below. | DELIVERED+REVIEWED (under archive-aware Oracle 3) — would have been DELIVERED-UNREVIEWED under live-only Oracle 3 | | 2026-05-26-game-dev-ownership-and-bidirectional-install-drift.md | status: implemented; 9 delivery commits; 13 ACs; reviewed: patrik 2026-05-26; code-reviewer 2026-05-26 | 12/13 ACs fully in-repo; AC7 realized as external coordinator dependency (landed) | COVERED. Live record state/review-trail/2026-05-26-131032300171-1afa35ae.json — range 609399fcc..HEAD; all 9 delivery commits fall inside range | DELIVERED+REVIEWED | | 2026-05-26-headless-extractor-seam-buildout.md | status: draft; active workstream with recovery handoff 2026-05-27_084305_e40956a9.md ("review-complete, ZERO C++ authored; resume by dispatching Phase 1 H-2 resolver") | Not run — no terminal close; Oracle 2 does not apply to in-flight work | Not applicable — delivery not claimed | IN-FLIGHT | | 2026-05-19-headless-extraction-buildout.md | status: draft; kind: roadmap-lite; explicitly superseded by the 05-26 seam plan | Not run — no delivery expected | Not applicable — abandoned by supersession | ABANDONED |

On DELIVERED-UNREVIEWED and PARTIAL buckets: the worked example's 4 plans landed cleanly in DELIVERED+REVIEWED / IN-FLIGHT / ABANDONED in their resolved state — but the 05-24 row is annotated above with what it would have been under a live-only Oracle 3 (DELIVERED-UNREVIEWED). That's the falsifying demonstration of why archive-aware reading is load-bearing: the same plan, audited the same day, resolves to a different bucket depending on whether Oracle 3 globs the archive. PARTIAL is exercised when Oracle 1 claims implemented but Oracle 2 finds cited file/symbol absent or AC tests red on disk — none of the 4 audited plans hit that path; the bucket is reached via the decision tree's Oracle 2 branch when typed-prefix tests fail.

Decision tree walk for each row:

patch-and-send-back (05-24): Oracle 1 = implemented → proceed to Oracle 2. Oracle 2 = 9/10 typed-prefix tests pass; AC3 unverifiable (URL liveness) but 9 others confirm → "all verifiable ACs pass". Proceed to Oracle 3. Oracle 3 = archived trail record covers all delivery commits via range-membership → COVERED. Bucket: DELIVERED+REVIEWED. ✓
game-dev-ownership (05-26): Oracle 1 = implemented → proceed to Oracle 2. Oracle 2 = 12/13 ACs in-repo (AC7 is an external dep, landed); all in-repo ACs pass grep/cited → "all verifiable ACs pass". Proceed to Oracle 3. Oracle 3 = live trail record covers all 9 delivery commits → COVERED. Bucket: DELIVERED+REVIEWED. ✓
headless-extractor-seam (05-26): Oracle 1 = draft → bucket resolves immediately at Oracle 1. No delivery claimed; active recovery handoff confirms in-flight status. Bucket: IN-FLIGHT. ✓ (Oracle 2 and 3 not needed.)
headless-extraction-buildout (05-19): Oracle 1 = draft with explicit supersession pointer → bucket resolves at Oracle 1. No delivery expected or claimed. Bucket: ABANDONED. ✓ (Audit recommends a frontmatter flip to status: superseded + superseded_by: for hygiene — the current draft falsely signals "resumable" to pickup candidates.)

Key finding from this audit: The handoff's alarm — "only 4 review-trail records, most shipped work unreviewed" — was an archival artifact, not a coverage gap. The /workweek-complete run on 2026-05-24 (commit db151655e) moved 22 review-trail records into archive/review-trail/2026-05-21/. A live-only read of state/review-trail/ saw only the current week's 4 records and missed them. Oracle 3 reading ONLY the live dir would have mis-classified the 05-24 plan as DELIVERED-UNREVIEWED — a false verdict. The archive-aware read via list-review-trail-records.sh is load-bearing, not optional.

Why a skill, not a one-shot

This shape recurs after every crash and every "did we actually ship what we think we shipped?" moment — it has fired at least twice in the coordinator workstream and once explicitly in holodeck (2026-05-27). Re-deriving the three-oracle protocol, the git range-membership formula, the archive-aware glob, and the sidecar exclusion rule by hand each time is the cost this skill exists to amortise.

Closest analogues in skills/: bug-sweep (sweep-and-classify), architecture-audit (multi-oracle assessment), validate (typed-prefix test running). All three support skill-shape for recurring structured procedures. A wiki-only home would re-derive the dispatch shape every invocation and would not be greppable from coordinator:plan or /workstream-complete as a routine option.

Dispatch sequencing

The skill orchestrates its own work in three phases:

Phase 1 — Gather and filter (~2 min, EM does this directly)

Glob the plan set: docs/plans/*.md (or the caller-supplied glob).
Filter out sidecars (any filename containing .prior-art-check, .coverage-check, .docs-check, .review-patrik, .review-, or .check.).
For each remaining plan, read frontmatter status: to rough-sort into buckets:
- status: superseded / abandoned / cancelled → ABANDONED (no further oracle work needed)
- status: draft / in-progress / reviewed → IN-FLIGHT (no further oracle work needed)
- status: implemented / shipped → needs Oracle 2 + 3 (add to work queue)

Phase 2 — Oracle 2 + 3 on implemented plans (~10 min, parallel scouts)

For each implemented plan, dispatch a read-only Sonnet scout with:

"Run all typed-prefix ACs for <plan-path> against current HEAD. Report each AC as PASS / FAIL / UNVERIFIABLE. Do NOT modify files, commit, or push. Return results inline."

Concurrently (EM-side), run Oracle 3 for each plan:

list-review-trail-records.sh   # get all live + archived records
# for each record, read sha_range and test delivery commits for range membership:
  git merge-base --is-ancestor <commit> <end_sha>    # must succeed (exit 0)
! git merge-base --is-ancestor <commit> <start_sha>  # must succeed (exit 0) — note the leading !

Phase 3 — Synthesise and write output (~3 min)

Apply the bucket decision tree to each plan. Write the output table + summary to state/audits/YYYY-MM-DD-plan-delivery-audit.md. Surface any DELIVERED-UNREVIEWED plans to the PM with a code-reviewer dispatch recommendation — do not dispatch autonomously, as the PM may choose to defer.

Plan-Delivery Audit

Spec backlink: docs/plans/2026-05-28-archive-aware-review-oracle-and-audit-skill.md § Chunk C5. Origin: Distilled from the 2026-05-27 holodeck audit (X:\claude-unreal-holodeck\tasks\recovery\2026-05-27-plan-delivery-audit.md) where a live-only review-trail read produced a false "most work unreviewed" alarm — 22 archived records had been moved by /workweek-complete Step 13 and were invisible to the live glob.

When to invoke

After any session crash or partition that puts shipping state in doubt.
Mid-quarter "what shipped this month?" reconciliation.
Post-merge audit when a chain of handoffs lost a clean stopping point.
PM question: "Did we actually finish what we think we finished?"
Any time a plan's status: implemented self-assertion needs independent verification.

Out-of-scope sidecars

Sidecar files generated by the coordinator review pipeline inherit their parent plan's status: frontmatter and must be excluded from the plan set being audited. Exclude any file matching:

*.prior-art-check.md
*.coverage-check.md
*.docs-check.md
*.review-patrik.md
*.review-sid.md
*.review-camelia.md
Any file whose basename contains .review- or .check.

The three oracles (read each independently)

Read each oracle independently before comparing. Oracle 1 is the thing under test; Oracles 2 and 3 are the falsifiers. Never let Oracle 1's self-assertion influence your Oracle 2 or 3 read.

Oracle 1 — Plan-claim

The plan's own frontmatter and AC status column. Read:

status: field (e.g. implemented, in-progress, draft, superseded, abandoned)
Any per-AC Status column entries in the plan body
Execution notes, reviewed: fields

Oracle 2 — Code-reality

grep:<pattern>@<file> → run as grep -n '<pattern>' <file> on disk
cited:<file> → verify the file exists at that path
bats:<file> → run bats <file> (or note if bats unavailable; mark as unverifiable)
pytest:<file> → run poetry run pytest <file> -q (or uv run pytest)
node:<file> → run node <file>

Oracle 3 — Review (archive-aware)

Does an independent review-trail record cover the plan's delivery commits?

A commit C is covered by trail record [A..B] if and only if:

  git merge-base --is-ancestor C B   # must succeed (exit 0): C is within the reviewed window (at or before B)
! git merge-base --is-ancestor C A   # must succeed (exit 0): C is NOT before the window start (after A, exclusive)

Trail records live at BOTH state/review-trail/**/*.json AND archive/review-trail/**/*.json. Read via:

list-review-trail-records.sh

For each trail record returned by list-review-trail-records.sh:

Read the record's sha_range field (format: "<start>..<end>")
For each of the plan's delivery commits C: run the two-clause ancestry test above
If any record covers all delivery commits, Oracle 3 = COVERED; else UNCOVERED

If the plan's delivery commits cannot be identified from frontmatter or execution notes, search git log --oneline for commits that touch files the plan's ACs describe.

Bucket decision tree

Apply this decision tree to each plan after running all three oracles. Every plan resolves into exactly one bucket — no plan should fall into two.

Tie-breaking rules:

A plan with status: implemented but unverifiable Oracle 2 (no typed-prefix ACs) goes into PARTIAL, not DELIVERED. Self-assertion without machine-checkable evidence is not delivery.
A plan with status: draft AND commit evidence of substantial shipped work still goes into IN-FLIGHT — the correct response is to flip the frontmatter, not to reclassify here.
DELIVERED-UNREVIEWED is a real state, not an error. The correct follow-up is to dispatch code-reviewer against the delivery diff and record the trail entry. Do not skip the review or backdoor a "reviewed" claim without running the actual reviewer.
ABANDONED plans found with Oracle 2 evidence of shipped code should surface as a concern in the audit output — the plan may need a frontmatter flip to superseded with a superseded_by: pointer.

Output format

Emit a markdown table — one row per audited plan:

| Plan | Oracle 1 (claim) | Oracle 2 (code) | Oracle 3 (review) | Bucket |
|------|-----------------|-----------------|-------------------|--------|
| `docs/plans/YYYY-MM-DD-name.md` | `status: X`; AC count | N/M ACs pass; misses if any | trail record `A..B` covers commits / UNCOVERED | **BUCKET** |

Follow the table with:

Summary — count per bucket.
Catch-up review queue — list every DELIVERED-UNREVIEWED plan with its delivery commits.
Frontmatter flip recommendations — list any ABANDONED plans with stale status: draft.
Method notes — flag any plan with unverifiable Oracle 2 (no typed-prefix ACs) or missing delivery commits.

Save to state/audits/YYYY-MM-DD-plan-delivery-audit.md. Create the state/audits/ directory if absent.

Worked example — 2026-05-27 holodeck audit

the Staff Engineer F3 falsifiability hook. The rows below are drawn directly from the holodeck audit at X:\claude-unreal-holodeck\tasks\recovery\2026-05-27-plan-delivery-audit.md. A reviewer can walk the decision tree against each row independently and verify the bucket assignment. Plans are in the holodeck repo (X:\claude-unreal-holodeck\docs\plans\).

Plans audited (4 real plans; sidecar exclusion applied first):

Decision tree walk for each row:

patch-and-send-back (05-24): Oracle 1 = implemented → proceed to Oracle 2. Oracle 2 = 9/10 typed-prefix tests pass; AC3 unverifiable (URL liveness) but 9 others confirm → "all verifiable ACs pass". Proceed to Oracle 3. Oracle 3 = archived trail record covers all delivery commits via range-membership → COVERED. Bucket: DELIVERED+REVIEWED. ✓
game-dev-ownership (05-26): Oracle 1 = implemented → proceed to Oracle 2. Oracle 2 = 12/13 ACs in-repo (AC7 is an external dep, landed); all in-repo ACs pass grep/cited → "all verifiable ACs pass". Proceed to Oracle 3. Oracle 3 = live trail record covers all 9 delivery commits → COVERED. Bucket: DELIVERED+REVIEWED. ✓
headless-extractor-seam (05-26): Oracle 1 = draft → bucket resolves immediately at Oracle 1. No delivery claimed; active recovery handoff confirms in-flight status. Bucket: IN-FLIGHT. ✓ (Oracle 2 and 3 not needed.)
headless-extraction-buildout (05-19): Oracle 1 = draft with explicit supersession pointer → bucket resolves at Oracle 1. No delivery expected or claimed. Bucket: ABANDONED. ✓ (Audit recommends a frontmatter flip to status: superseded + superseded_by: for hygiene — the current draft falsely signals "resumable" to pickup candidates.)

Why a skill, not a one-shot

Dispatch sequencing

The skill orchestrates its own work in three phases:

Phase 1 — Gather and filter (~2 min, EM does this directly)

Glob the plan set: docs/plans/*.md (or the caller-supplied glob).
Filter out sidecars (any filename containing .prior-art-check, .coverage-check, .docs-check, .review-patrik, .review-, or .check.).
For each remaining plan, read frontmatter status: to rough-sort into buckets:
- status: superseded / abandoned / cancelled → ABANDONED (no further oracle work needed)
- status: draft / in-progress / reviewed → IN-FLIGHT (no further oracle work needed)
- status: implemented / shipped → needs Oracle 2 + 3 (add to work queue)

Phase 2 — Oracle 2 + 3 on implemented plans (~10 min, parallel scouts)

For each implemented plan, dispatch a read-only Sonnet scout with:

"Run all typed-prefix ACs for <plan-path> against current HEAD. Report each AC as PASS / FAIL / UNVERIFIABLE. Do NOT modify files, commit, or push. Return results inline."

Concurrently (EM-side), run Oracle 3 for each plan:

list-review-trail-records.sh   # get all live + archived records
# for each record, read sha_range and test delivery commits for range membership:
  git merge-base --is-ancestor <commit> <end_sha>    # must succeed (exit 0)
! git merge-base --is-ancestor <commit> <start_sha>  # must succeed (exit 0) — note the leading !

Phase 3 — Synthesise and write output (~3 min)

Adoption

oduffy-delphi/plan-delivery-audit

$ install --global

Security Scan Results

SKILL.md

Plan-Delivery Audit

When to invoke

Out-of-scope sidecars

The three oracles (read each independently)

Oracle 1 — Plan-claim

Oracle 2 — Code-reality

Oracle 3 — Review (archive-aware)

Bucket decision tree

Output format

Worked example — 2026-05-27 holodeck audit

Why a skill, not a one-shot

Dispatch sequencing

Related Skills

oduffy-delphi/workstream-start

oduffy-delphi/workstream-complete

oduffy-delphi/validate

oduffy-delphi/systematic-debugging

oduffy-delphi/plan-delivery-audit

$ install --global

Security Scan Results

SKILL.md

Plan-Delivery Audit

When to invoke

Out-of-scope sidecars

The three oracles (read each independently)

Oracle 1 — Plan-claim

Oracle 2 — Code-reality

Oracle 3 — Review (archive-aware)

Bucket decision tree

Output format

Worked example — 2026-05-27 holodeck audit

Why a skill, not a one-shot

Dispatch sequencing

Related Skills

oduffy-delphi/workstream-start

oduffy-delphi/workstream-complete

oduffy-delphi/validate

oduffy-delphi/systematic-debugging