plugins/coordinator/skills/plan-delivery-audit/SKILL.md
Triangulate plan-claim / code-reality / review oracles to classify each plan into DELIVERED+REVIEWED / DELIVERED-UNREVIEWED / PARTIAL / IN-FLIGHT / ABANDONED. Run after any crash or 'did we actually finish what we think we finished?' moment.
npx skillsauth add oduffy-delphi/coordinator-claude plan-delivery-auditInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Spec backlink:
docs/plans/2026-05-28-archive-aware-review-oracle-and-audit-skill.md§ Chunk C5. Origin: Distilled from the 2026-05-27 holodeck audit (X:\claude-unreal-holodeck\tasks\recovery\2026-05-27-plan-delivery-audit.md) where a live-only review-trail read produced a false "most work unreviewed" alarm — 22 archived records had been moved by/workweek-completeStep 13 and were invisible to the live glob.
status: implemented self-assertion needs independent verification.Sidecar files generated by the coordinator review pipeline inherit their parent plan's
status: frontmatter and must be excluded from the plan set being audited. Exclude
any file matching:
*.prior-art-check.md*.coverage-check.md*.docs-check.md*.review-patrik.md*.review-sid.md*.review-camelia.md.review- or .check.Apply the exclusion at glob time. Default glob: docs/plans/*.md minus the patterns above.
The holodeck audit (2026-05-27) confirmed this exclusion is load-bearing — sidecars were the
source of misleading status: reviewed entries in the candidate set before filtering.
Read each oracle independently before comparing. Oracle 1 is the thing under test; Oracles 2 and 3 are the falsifiers. Never let Oracle 1's self-assertion influence your Oracle 2 or 3 read.
The plan's own frontmatter and AC status column. Read:
status: field (e.g. implemented, in-progress, draft, superseded, abandoned)Status column entries in the plan bodyreviewed: fieldsTreat these as the plan's claim, not as ground truth. Oracle 1 is the hypothesis;
Oracles 2 and 3 either confirm or falsify it. A plan confidently self-reporting
status: implemented with all ACs green is a starting point, not a verdict.
For each AC whose typed-prefix test can be run mechanically (grep:, cited:, bats:,
pytest:, node:), run it against current HEAD. Ignore the AC's own Status note — the
test result is the evidence, not the assertion.
grep:<pattern>@<file> → run as grep -n '<pattern>' <file> on diskcited:<file> → verify the file exists at that pathbats:<file> → run bats <file> (or note if bats unavailable; mark as unverifiable)pytest:<file> → run poetry run pytest <file> -q (or uv run pytest)node:<file> → run node <file>Miss on any grep or cited test = PARTIAL, not delivered, regardless of what Oracle 1 claims. If a plan has no typed-prefix ACs (narrative-only), note that Oracle 2 is unverifiable and treat the plan conservatively.
Dispatch shape: For audits covering ≥4 plans, dispatch parallel read-only Sonnet scouts (one per plan). Each scout runs every typed-prefix test for its assigned plan, returns inline results, EM synthesises. Do NOT modify files, commit, or push during Oracle 2 runs.
Does an independent review-trail record cover the plan's delivery commits?
A commit C is covered by trail record [A..B] if and only if:
git merge-base --is-ancestor C B # must succeed (exit 0): C is within the reviewed window (at or before B)
! git merge-base --is-ancestor C A # must succeed (exit 0): C is NOT before the window start (after A, exclusive)
Both clauses are required: is-ancestor C B AND ! is-ancestor C A. The negative clause
prevents a review record from absorbing commits that predate the reviewed window — without it,
any record with a sufficiently old start SHA would appear to "cover" any commit in the repo.
The ! negation in front of the second git merge-base is load-bearing — omitting it inverts the
test from "after the window start" to "before the window start" and silently breaks the audit.
Trail records live at BOTH tasks/review-trail/**/*.json AND archive/review-trail/**/*.json.
Read via:
list-review-trail-records.sh
DO NOT glob tasks/review-trail/ alone. The /workweek-complete Step 13 archival moves
all current-week records into archive/review-trail/<week-starting>/ on every weekly reset.
A live-only read systematically under-counts coverage for anything older than the current week.
This is the exact failure mode that prompted this skill — the 2026-05-27 holodeck audit found
22 archived records invisible in the live dir, producing a false "most work unreviewed" alarm.
See docs/wiki/workstream-complete-review.md § Archive-Aware Glob.
For each trail record returned by list-review-trail-records.sh:
sha_range field (format: "<start>..<end>")If the plan's delivery commits cannot be identified from frontmatter or execution notes,
search git log --oneline for commits that touch files the plan's ACs describe.
Apply this decision tree to each plan after running all three oracles. Every plan resolves into exactly one bucket — no plan should fall into two.
| Oracle 1 (claim) | Oracle 2 (code) | Oracle 3 (review) | Bucket |
|------------------|-----------------|-------------------|--------|
| status: implemented (or shipped) | All typed-prefix ACs pass at HEAD | ≥1 trail record covers delivery commits | DELIVERED+REVIEWED |
| status: implemented (or shipped) | All typed-prefix ACs pass at HEAD | No trail record covers delivery commits | DELIVERED-UNREVIEWED |
| status: implemented (or shipped) | ≥1 typed-prefix AC fails at HEAD, OR no typed-prefix ACs (unverifiable) | (any) | PARTIAL |
| status: in-progress / draft / reviewed (not yet implemented) | (any) | (any) | IN-FLIGHT |
| status: superseded / abandoned / cancelled | (any) | (any) | ABANDONED |
Tie-breaking rules:
status: implemented but unverifiable Oracle 2 (no typed-prefix ACs) goes into
PARTIAL, not DELIVERED. Self-assertion without machine-checkable evidence is not delivery.status: draft AND commit evidence of substantial shipped work still goes into
IN-FLIGHT — the correct response is to flip the frontmatter, not to reclassify here.code-reviewer against the delivery diff and record the trail entry. Do not skip the review
or backdoor a "reviewed" claim without running the actual reviewer.superseded with a
superseded_by: pointer.Emit a markdown table — one row per audited plan:
| Plan | Oracle 1 (claim) | Oracle 2 (code) | Oracle 3 (review) | Bucket |
|------|-----------------|-----------------|-------------------|--------|
| `docs/plans/YYYY-MM-DD-name.md` | `status: X`; AC count | N/M ACs pass; misses if any | trail record `A..B` covers commits / UNCOVERED | **BUCKET** |
Follow the table with:
status: draft.Save to tasks/audits/YYYY-MM-DD-plan-delivery-audit.md. Create the tasks/audits/ directory
if absent.
the Staff Engineer F3 falsifiability hook. The rows below are drawn directly from the holodeck audit at
X:\claude-unreal-holodeck\tasks\recovery\2026-05-27-plan-delivery-audit.md. A reviewer can walk the decision tree against each row independently and verify the bucket assignment. Plans are in the holodeck repo (X:\claude-unreal-holodeck\docs\plans\).
Plans audited (4 real plans; sidecar exclusion applied first):
| Plan | Oracle 1 (claim) | Oracle 2 (code) | Oracle 3 (review) | Bucket |
|------|-----------------|-----------------|-------------------|--------|
| 2026-05-24-patch-and-send-back-contribution-invite.md | status: implemented; cites commits d67f7371b, d368b6269, a92447c17, b5b824d47; 10 ACs | 9/10 ACs verified at disk; AC3 (URL liveness) inherently manual — remaining 9 pass grep/cited checks | COVERED (archive-aware). archive/review-trail/2026-05-21/2026-05-24-115430145738-08b5c444.json — range 977a40b29..cc4c936d6; all 4 delivery commits satisfy is-ancestor C cc4c936d6 && ! is-ancestor C 977a40b29. A live-only Oracle 3 (tasks/review-trail/ glob without archive/review-trail/**) would mis-classify this plan as DELIVERED-UNREVIEWED — see Key finding below. | DELIVERED+REVIEWED (under archive-aware Oracle 3) — would have been DELIVERED-UNREVIEWED under live-only Oracle 3 |
| 2026-05-26-game-dev-ownership-and-bidirectional-install-drift.md | status: implemented; 9 delivery commits; 13 ACs; reviewed: patrik 2026-05-26; code-reviewer 2026-05-26 | 12/13 ACs fully in-repo; AC7 realized as external coordinator dependency (landed) | COVERED. Live record tasks/review-trail/2026-05-26-131032300171-1afa35ae.json — range 609399fcc..HEAD; all 9 delivery commits fall inside range | DELIVERED+REVIEWED |
| 2026-05-26-headless-extractor-seam-buildout.md | status: draft; active workstream with recovery handoff 2026-05-27_084305_e40956a9.md ("review-complete, ZERO C++ authored; resume by dispatching Phase 1 H-2 resolver") | Not run — no terminal close; Oracle 2 does not apply to in-flight work | Not applicable — delivery not claimed | IN-FLIGHT |
| 2026-05-19-headless-extraction-buildout.md | status: draft; kind: roadmap-lite; explicitly superseded by the 05-26 seam plan | Not run — no delivery expected | Not applicable — abandoned by supersession | ABANDONED |
On DELIVERED-UNREVIEWED and PARTIAL buckets: the worked example's 4 plans landed cleanly in DELIVERED+REVIEWED / IN-FLIGHT / ABANDONED in their resolved state — but the 05-24 row is annotated above with what it would have been under a live-only Oracle 3 (DELIVERED-UNREVIEWED). That's the falsifying demonstration of why archive-aware reading is load-bearing: the same plan, audited the same day, resolves to a different bucket depending on whether Oracle 3 globs the archive. PARTIAL is exercised when Oracle 1 claims implemented but Oracle 2 finds cited file/symbol absent or AC tests red on disk — none of the 4 audited plans hit that path; the bucket is reached via the decision tree's Oracle 2 branch when typed-prefix tests fail.
Decision tree walk for each row:
patch-and-send-back (05-24): Oracle 1 = implemented → proceed to Oracle 2. Oracle 2 = 9/10 typed-prefix tests pass; AC3 unverifiable (URL liveness) but 9 others confirm → "all verifiable ACs pass". Proceed to Oracle 3. Oracle 3 = archived trail record covers all delivery commits via range-membership → COVERED. Bucket: DELIVERED+REVIEWED. ✓
game-dev-ownership (05-26): Oracle 1 = implemented → proceed to Oracle 2. Oracle 2 = 12/13 ACs in-repo (AC7 is an external dep, landed); all in-repo ACs pass grep/cited → "all verifiable ACs pass". Proceed to Oracle 3. Oracle 3 = live trail record covers all 9 delivery commits → COVERED. Bucket: DELIVERED+REVIEWED. ✓
headless-extractor-seam (05-26): Oracle 1 = draft → bucket resolves immediately at Oracle 1. No delivery claimed; active recovery handoff confirms in-flight status. Bucket: IN-FLIGHT. ✓ (Oracle 2 and 3 not needed.)
headless-extraction-buildout (05-19): Oracle 1 = draft with explicit supersession pointer → bucket resolves at Oracle 1. No delivery expected or claimed. Bucket: ABANDONED. ✓ (Audit recommends a frontmatter flip to status: superseded + superseded_by: for hygiene — the current draft falsely signals "resumable" to pickup candidates.)
Key finding from this audit: The handoff's alarm — "only 4 review-trail records, most shipped work unreviewed" — was an archival artifact, not a coverage gap. The /workweek-complete run on 2026-05-24 (commit db151655e) moved 22 review-trail records into archive/review-trail/2026-05-21/. A live-only read of tasks/review-trail/ saw only the current week's 4 records and missed them. Oracle 3 reading ONLY the live dir would have mis-classified the 05-24 plan as DELIVERED-UNREVIEWED — a false verdict. The archive-aware read via list-review-trail-records.sh is load-bearing, not optional.
This shape recurs after every crash and every "did we actually ship what we think we shipped?" moment — it has fired at least twice in the coordinator workstream and once explicitly in holodeck (2026-05-27). Re-deriving the three-oracle protocol, the git range-membership formula, the archive-aware glob, and the sidecar exclusion rule by hand each time is the cost this skill exists to amortise.
Closest analogues in skills/: bug-sweep (sweep-and-classify), architecture-audit
(multi-oracle assessment), validate (typed-prefix test running). All three support skill-shape
for recurring structured procedures. A wiki-only home would re-derive the dispatch shape every
invocation and would not be greppable from coordinator:plan or /workstream-complete as a routine option.
The skill orchestrates its own work in three phases:
Phase 1 — Gather and filter (~2 min, EM does this directly)
docs/plans/*.md (or the caller-supplied glob)..prior-art-check, .coverage-check,
.docs-check, .review-patrik, .review-, or .check.).status: to rough-sort into buckets:
status: superseded / abandoned / cancelled → ABANDONED (no further oracle work needed)status: draft / in-progress / reviewed → IN-FLIGHT (no further oracle work needed)status: implemented / shipped → needs Oracle 2 + 3 (add to work queue)Phase 2 — Oracle 2 + 3 on implemented plans (~10 min, parallel scouts)
For each implemented plan, dispatch a read-only Sonnet scout with:
"Run all typed-prefix ACs for
<plan-path>against current HEAD. Report each AC as PASS / FAIL / UNVERIFIABLE. Do NOT modify files, commit, or push. Return results inline."
Concurrently (EM-side), run Oracle 3 for each plan:
list-review-trail-records.sh # get all live + archived records
# for each record, read sha_range and test delivery commits for range membership:
git merge-base --is-ancestor <commit> <end_sha> # must succeed (exit 0)
! git merge-base --is-ancestor <commit> <start_sha> # must succeed (exit 0) — note the leading !
Phase 3 — Synthesise and write output (~3 min)
Apply the bucket decision tree to each plan. Write the output table + summary to
tasks/audits/YYYY-MM-DD-plan-delivery-audit.md. Surface any DELIVERED-UNREVIEWED plans
to the PM with a code-reviewer dispatch recommendation — do not dispatch autonomously,
as the PM may choose to defer.
tools
Orient session — preflight, load context, choose work
documentation
Wrap up finished work — capture lessons, update docs
testing
Check for a published coordinator update and advise a preserve-by-default migration path — never a blind overwrite.
development
Rotational arch audit — scores systems, audits top-priority, packages spinoff candidates. Never edits code; updates Last-targeted-audit clock.