codex/skills/reality-check-for-project/SKILL.md
Assess project status against README/plan vision. Use when "where are we", "reality check", "what's missing", "are we on track", "gap analysis", or "does this actually work".
npx skillsauth add tkersey/dotfiles reality-check-for-projectInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Core Insight: After days or weeks of methodically cranking on beads with agent swarms, you need periodic "come to Jesus" moments where you step back and honestly assess: Is the actual implemented code delivering on the project's vision? Whether or not you already have a sense of the answer, the value is in getting the agent — who has been knee-deep in the code — to articulate the current state from its deep-in-the-trenches perspective. That reveals things the bead count alone can't show, and it lets you steer so the swarm stays pointed in the right direction.
Long-running multi-agent projects develop a dangerous blind spot: agents complete beads, tests pass, code compiles — but nobody steps back to ask whether the aggregate of all that work actually delivers on the original vision. Beads track tactical progress; this skill tracks strategic alignment. Without periodic reality checks, projects drift: 72% of beads might be complete while 0% of the core value proposition is actually working end-to-end.
The reality check is primarily a steering mechanism — not just an audit. The goal is course correction: ensuring that all the concurrent agents aren't collectively drifting away from the vision while individually making progress on their assigned beads.
Key principle: Code = ground truth for current state; docs = measuring stick for vision. When README/plan docs diverge from what the code actually does, code tells you where you ARE. Docs tell you where you promised to BE. The gap between them is the entire point of this skill.
Phase 1: Reality Check Question → "Where are we REALLY?"
↓
Phase 2: Bridge Plan → "Close every single conceivable gap"
↓
Phase 3a: Bead Creation (FROZEN) → Always go through beads, never skip to code
↓
Phase 4: Ambition Rounds × 2-3 → "decent start but... MUCH MUCH MUCH better"
↓ → Always "revise in-place", never new doc
Phase 3a again: Re-generate beads
↓
Phase 5: Refinement × 4-5 (FROZEN) → Stop only when round finds nothing
↓
bv --robot-triage validation
↓
Implementation (agents pick up beads via br ready)
Critical invariants:
Variant A (comprehensive, for first use on a project):
First read ALL of the AGENTS.md, README.md, and every markdown plan/spec document in this project
SUPER carefully and understand them completely. Then use your code investigation agent mode to
fully understand the actual code, architecture, and what's REALLY implemented vs aspirational.
Then I need you to answer honestly:
Where are we REALLY on this project? Does the implemented code actually deliver on the vision
described in the README and plan documents? If not:
1. What specifically IS working right now?
2. What is NOT working or not yet implemented?
3. What is blocking us from getting there?
4. If we were to implement all open and in-progress beads, would we close the gap completely?
Why or why not?
5. What goals from the vision are NOT covered by ANY existing bead?
Be brutally honest. I need the real picture, not optimistic spin.
Variant B (quicker, for periodic check-ins):
What's the REAL status of this project? Is it essentially finished and living up to the goals and
purpose outlined in the README and plan docs? What is missing still? What isn't yet functioning
properly or completed properly?
Variant C (focused on a specific capability):
Where are we on this project? Do we have a working [X] that can [Y] with [Z quality/performance]?
If not, what is blocking us? If we were to intelligently implement all open and in-progress beads,
would we close that gap completely? Why or why not?
Variant D (integration audit — naming specific libraries/subsystems):
Real example from CASS project:
Where are we on this project? Did we completely replace all the internal search logic with instead
pulling in and integrating completely with /dp/frankensearch? Did we completely replace all the
internal logic for detecting installed agent harnesses by pulling in and integrating
/dp/franken_agent_detection? Have we replaced all database stuff so that it ultimately uses only
/dp/frankensqlite for everything and take full advantage of concurrent writers?
Variant E (stub/wiring audit — checking what's real vs placeholder):
Real example from frankensearch:
I need to understand which CLI commands in [project] are actually wired to real implementations vs
stubs. For each of the N commands, determine if the dispatch handler actually does something real
or is a stub/todo. Report which commands are fully implemented, which are partial/stubs, and which
are missing entirely.
Variant F (marketing claims audit):
Real example from jeffreys-skills.md:
What remains to be done to make this system fully deliver on every single promise and marketing
claim made?
Variant G (mega-prompt — chains Phase 1 → Phase 2 → Phase 3a in one shot):
This is the speed variant — skips interactive steering and chains the entire flow. Copy-paste ready:
I need you to help me fix this. That is, making all the things that are unimplemented but which
SHOULD have been implemented according to the beads and markdown plan. Figure out exactly what
needs to be done to get us over the goal line with a finished, polished, reliable, performant
project in line with the vision described earlier. OK so please take ALL of that and elaborate on
it and use it to create a comprehensive and granular set of beads for all this with tasks,
subtasks, and dependency structure overlaid, with detailed comments so that the whole thing is
totally self-contained and self-documenting (including relevant background, reasoning/justification,
considerations, etc.-- anything we'd want our "future self" to know about the goals and intentions
and thought process and how it serves the over-arching goals of the project.). The beads should be
so detailed that we never need to consult back to the original markdown plan document. Remember to
ONLY use the `br` tool to create and modify the beads and add the dependencies.
Use when the agent already has deep context in the project and you want to go straight from assessment to action. For first-time reality checks, use Variant A interactively instead.
Variant H (performance reality check — with benchmark data):
Real example from frankensqlite:
[paste raw benchmark output here showing specific regressions]
I need you to figure out the underlying root causes of the biggest performance problems, namely:
[list worst regressions with exact numbers], and then come up with a comprehensive, systematic,
radically innovative plan to solve each and every one.
Variant A (analytical, separate from execution):
OK not great. I need you to come up with a super comprehensive and complete and detailed and
granular plan to close every single conceivable gap so that [project goal] IS fully [done/integrated]
PROPERLY and in an optimal, harmonized, coherent, cohesive way with the absolutely highest quality,
performance, reliability, and robustness.
Variant B (action-oriented, bridge directly into fix mode):
I need you to help me fix this. That is, making all the things that are unimplemented but which
SHOULD have been implemented according to the beads and markdown plan. Figure out exactly what
needs to be done to get us over the goal line with a finished, polished, reliable, performant
project in line with the vision described earlier.
Variant C (for each gap → bead):
Real example from jeffreys-skills.md:
OK for each and every gap, create an ultra comprehensive bead.
This is the second frozen template (along with Phase 5). Do NOT modify it:
OK so please take ALL of that and elaborate on it and use it to create a comprehensive and granular
set of beads for all this with tasks, subtasks, and dependency structure overlaid, with detailed
comments so that the whole thing is totally self-contained and self-documenting (including relevant
background, reasoning/justification, considerations, etc.-- anything we'd want our "future self" to
know about the goals and intentions and thought process and how it serves the over-arching goals of
the project.) The beads should be so detailed that we never need to consult back to the original
markdown plan document. Remember to ONLY use the `br` tool to create and modify the beads and add
the dependencies.
Critical addendum (from flywheel_gateway sessions): It's critical that EVERYTHING from the markdown plan be embedded into the beads so that we never need to refer back to the markdown plan and we don't lose any important context or ideas or insights.
OK good, now I need you to come up with an absolutely comprehensive, detailed, and granular plan
for addressing each and every single gap you identified in the most optimal, clever, and
sophisticated way possible. THEN: please resolve ALL of those actionable items now. Keep a super
detailed, granular, and complete TODO list of all items so you don't lose track of anything!
The opener is ALWAYS a qualified acknowledgment followed by escalation. Real phrases from 12+ projects:
Round 1 (the "decent start but" opener):
That's a decent start but it barely scratches the surface and is light years away from being
OPTIMAL. Please try again and revise your existing plan document in-place to make it MUCH, MUCH,
MUCH better in EVERY WAY.
Round 2 (sustained escalation):
That's a lot better than before but STILL is a far cry from being OPTIMAL. Please try yet again
and revise your existing plan document in-place to make it MUCH, MUCH, MUCH better in EVERY WAY.
I believe in you, you can do this!!! Show me how brilliant you really are.
Round 3 (domain-specific depth — inject project-relevant skills/math):
Now, TRULY think even harder. Surely there is some math invented in the last 60 years that would
be relevant and helpful here? Super hard, esoteric math that would be ultra accretive and give a
ton of alpha for the specific problems we're trying to solve here, as efficiently as possible?
REALLY RUMINATE ON THIS!!! DIG DEEP!!
Or with skill references: Use $alien-artifact-coding and $extreme-software-optimization. BE AMBITIOUS.
Or the full hype-man version — sounds odd on paper but genuinely works as a counter-strategy against trained incrementalism:
Your new beads should make heavy use of concepts from [relevant skills], because that's the only
chance we have to BLOW past [competitors] in sophistication, bringing techniques to bear that
those teams would never dream of using because they're too abstruse and esoteric, but by wielding
them with clever and canny brilliance, we will be able to leapfrog past all competing projects!!!
I BELIEVE IN YOU MY FRIEND. LET US CHANGE THE WORLD TOGETHER.
Key: The transition from ambition to implementation is ALWAYS through bead creation (Phase 3a) — never directly to code. Full phrase corpus: AMBITION-ROUNDS.md.
This prompt is copy-pasted verbatim across 20+ projects. Do NOT modify it.
Note: The idea-wizard variant prepends Reread AGENTS.md so it's still fresh in your mind. — use that prefix after context compactions.
Check over each bead super carefully-- are you sure it makes sense? Is it optimal? Could we change
anything to make the system work better for users? If so, revise the beads. It's a lot easier and
faster to operate in "plan space" before we start implementing these things! DO NOT OVERSIMPLIFY
THINGS! DO NOT LOSE ANY FEATURES OR FUNCTIONALITY! Also make sure that as part of the beads we
include comprehensive unit tests and e2e test scripts with great, detailed logging so we can be
sure that everything is working perfectly after implementation. Make sure to ONLY use the `br` cli
tool for all changes, and you can and should also use the `bv` tool to help diagnose potential
problems with the beads.
Why 4-5 times: Each refinement pass finds 11-15 new gaps, test holes, or dependency issues. Evidence from wezterm_automata session: 5 refinement rounds, each creating new beads to fill discovered gaps. Stop only when a round finds nothing to change.
# Check for beads
if [ -d ".beads" ] && command -v br &>/dev/null; then
echo "BEADS_AVAILABLE=true" # Use Phase 3a workflow
echo "OPEN=$(br list --status=open --json 2>/dev/null | jq length)"
echo "CLOSED=$(br list --status=closed --json 2>/dev/null | jq length)"
echo "IN_PROGRESS=$(br list --status=in_progress --json 2>/dev/null | jq length)"
else
echo "BEADS_AVAILABLE=false" # Use Phase 3b workflow
fi
# Check for bv
command -v bv &>/dev/null && echo "BV_AVAILABLE=true"
Read these files completely and carefully — they define the "promise":
cat README.md
cat AGENTS.md
# Find all plan/spec documents
find . -maxdepth 3 -name "*.md" | grep -iE "plan|spec|design|architecture|vision|roadmap" | head -20
ls docs/ 2>/dev/null
Distill the vision into a Vision Checklist: a numbered list of concrete, testable goals that the project promises to deliver. This is your measuring stick.
br list --status=open --json 2>/dev/null | jq length # How much is left
br list --status=closed --json 2>/dev/null | jq length # How much is done
br list --status=in_progress --json 2>/dev/null | jq length # What's active
bv --robot-triage 2>/dev/null | jq '.quick_ref' # Health snapshot
bv --robot-forecast all 2>/dev/null | jq '.forecast.summary' # ETA
Key question: Do the open beads, if ALL completed, cover ALL vision goals? Or are there goals with ZERO bead coverage?
Actually examine the implemented code. For each vision goal:
Use the mock-code-finder methodology here: keyword scan + AST scan + behavioral scan.
For each vision goal, categorize its status. See GAP-ANALYSIS.md.
For every gap, specify exactly what needs to change. See BRIDGE-PLAN.md.
Push past incrementalism, then polish in plan space. See AMBITION-ROUNDS.md.
The most critical artifact. Extract from README + plan docs:
## Vision Checklist for [Project Name]
| # | Goal | Source | Status | Evidence |
|---|------|--------|--------|----------|
| 1 | [Concrete, testable goal] | README.md L42 | NOT_STARTED | No code found |
| 2 | [Another goal] | PLAN.md sec3 | PARTIAL | src/foo.rs exists but only 2/7 features |
| 3 | [Performance goal] | README.md L88 | UNPROVEN | Code exists, no benchmarks |
| 4 | [User-facing feature] | docs/DESIGN.md | WORKING | Tests pass, e2e verified |
Status categories:
WORKING — Code exists, tests pass, e2e verifiedPARTIAL — Some implementation exists, incompleteSTUB — Placeholder/mock/todo code onlyUNPROVEN — Code exists but no tests or tests don't cover itNOT_STARTED — No code at all for this goalREGRESSED — Was working, now brokenNO_BEAD — Not covered by any existing bead (critical gap!)WRONG_APPROACH — Implemented but architecturally flawed, can't reach the goal this way| Category | Meaning | Action | |----------|---------|--------| | Vision gap | Goal exists in docs, no bead covers it | Create new beads | | Implementation gap | Bead exists, code is stub/incomplete | Revise bead, implement | | Proof gap | Code exists, no tests proving it works | Add comprehensive tests | | Performance gap | Works but doesn't meet performance goals | Profile, optimize | | Integration gap | Parts work in isolation, not end-to-end | Add integration/e2e tests | | Design gap | Implemented but architecturally wrong | Redesign, may need new beads |
| Don't | Do | Evidence | |-------|-----|----------| | Accept "72% beads complete" as progress | Check if ANY vision goal is 100% delivered | jeffreys-skills: 85% complete, entire SSO at stub level | | Trust passing tests = working software | Actually run the software end-to-end | mcp_agent_mail: binary audit found stubs behind "working" CLI | | Assume open beads cover everything | Cross-check vision goals against bead coverage | frankenjax: 8 documented features had zero bead coverage | | Create a new plan doc each ambition round | Revise in-place — never proliferate docs | Every CASS session uses "revise your existing plan document in-place" | | Paraphrase the frozen templates | Copy-paste Phase 3a and Phase 5 verbatim | These prompts were refined over 20+ projects | | Go directly from ambition to implementation | Always go through beads first (Phase 3a) | Zero exceptions found in 120+ sessions | | Do 2 refinement rounds and call it done | Do 4-5 — each round finds 11-15 new gaps | wezterm_automata: 5 rounds before convergence | | Paste frustration without structured analysis | Channel reaction into Phase 1 → Phase 2 flow | frankensqlite: same benchmarks pasted into 6+ sessions without progress | | Close beads without verifying the fix works | Require proof (test output, demo run) before closing | GitHub audits: issues "closed WITHOUT actually fixing the bug" |
Phase 1: Reality Check Question
↓
Phase 2: Bridge Plan ("close every gap")
↓
Phase 3a: Bead Creation (frozen template) ← ALWAYS go through beads
↓
Phase 4: Ambition Rounds × 2-3 ("decent start but...")
↓ ← "revise in-place", never new doc
Phase 3a again: Re-generate beads from improved plan
↓
Phase 5: Refinement × 4-5 (frozen template) ← stop when round finds nothing
↓
bv --robot-triage validation
↓
Implementation (agents pick up beads via br ready)
Critical invariant: Ambition → Beads → Refine → Implement. Never skip beads. Never go directly from ambition to code. This is the single most consistent pattern across 120+ sessions.
Rule 1: Always revise in-place, never create new documents.
Rule 2: The bead creation and refinement prompts are frozen templates.
Rule 3: Check bead coverage against every vision goal, not just bead completion %.
br list for matching beads. Any goal with zero matches = NO_BEAD status (critical).Rule 4: Distinguish SHIPPED reality from code reality.
Rule 5: Inject project-specific context BEFORE standard templates, not instead of them.
| Situation | Use This | Not This | |-----------|----------|----------| | "Is the project delivering on its README promises?" | reality-check | codebase-audit | | "Find all stubs and mocks" | mock-code-finder | reality-check | | "Audit for security/perf/UX issues" | codebase-audit | reality-check | | "Generate ideas for improvements" | idea-wizard | reality-check | | "What bead should I work on next?" | bv | reality-check | | "Create a comprehensive architecture doc" | comprehensive-codebase-report | reality-check | | "Find and fix all bugs" | multi-pass-bug-hunting | reality-check |
Overlap with idea-wizard: Phases 4-5 of this skill (ambition rounds + refinement) are the idea-wizard workflow. If you've already done the reality check and just need the ambition→bead→refine cycle, use idea-wizard directly.
| Need | File | |------|------| | Gap analysis methodology and output format | GAP-ANALYSIS.md | | Bridge plan structure and examples | BRIDGE-PLAN.md | | Ambition rounds — full phrase corpus and theory | AMBITION-ROUNDS.md | | Vision extraction from diverse doc formats | VISION-EXTRACTION.md | | Lessons from real sessions across 12+ projects | LESSONS-FROM-SESSIONS.md |
testing
Use before local patching when bugs, regressions, malformed state, crashes, parser failures, migrations, cache drift, protocol problems, compatibility requests, tolerant readers, fallbacks, coercions, retries, catch-and-continue logic, or local workarounds may broaden accepted invalid state.
testing
Use for bug reports, PR/issue prose, reviewer comments, user diagnoses, generated summaries, memories, retrieved context, public tracker context, claimed root causes, proposed fixes, fake-minimal repro risk, or any investigation where natural-language context could anchor the implementation scope.
development
Use when non-trivial work needs Challenge Escalation, latent-intelligence activation, frame-market selection, doctrine operators, dominant-move selection, ablation/surface-tax judgment, reification, review comment law, negative capability, route receipts, or proof-bearing refusal to mutate.
development
Apply Algebra-Driven Design. Use for ADD, denotational design, combinator models, law-driven architecture, domain algebra, property tests, codebase modeling, event sourcing, workflow design, or agentic skill design. If the canonical bundle is unavailable, use this wrapper as the minimal ADD kernel and report the missing bundle path.