codex/skills/parse/SKILL.md
Classify a local codebase's current architecture from collector-backed, code-first evidence. Use for prompts asking what architecture a repo or slice actually uses, whether it is layered/hexagonal/MVC/plugin/pipeline/etc., what the strongest runner-up is, which coexisting patterns are directly evidenced, whether docs match implementation, or when an implementation agent needs a narrow repo-dialect preflight. Do not use for broad repo onboarding, layer removal, structural redesign, domain algebra, invariant design, implementation specs, or execution planning.
npx skillsauth add tkersey/dotfiles parseInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
parse is a narrow architecture classifier.
It identifies the architecture a repository is using right now from code-first evidence. It produces a compact architecture fingerprint: repo kind, one dominant architecture label, confidence, runner-up/near-miss, material subsystem variants, up to two directly evidenced coexisting patterns, docs-vs-code drift, narrow repo-fit hints, and an agent handoff packet.
parse is not an architecture designer, codebase onboarding guide,
simplification reviewer, invariant gate, spec generator, or execution planner.
It should leave the user or downstream agent knowing what architecture is
present, not what architecture should replace it.
Use parse when the main question is classification:
Do not use parse when the main question is exploration, redesign, removal,
invariant enforcement, spec writing, or planning. Use the routing boundary below
to decide before starting.
These boundaries are for deciding whether parse is the right skill. Do not
turn them into follow-up recommendations in the final parse memo unless the user
explicitly asks what to do next.
| User intent | Use instead |
|---|---|
| "Explain how this repo works", onboarding, entrypoints, domain model, data flow, integrations, tests | codebase-archaeology |
| "This is over-engineered", "what can we delete", remove framework/DI/codegen/ORM/GraphQL/infra layers | reduce |
| Boundary redesign, repeated validation/policy branching, protocol/state-machine drift, generated provenance loss, callback/effect boundary, explicit IR/law tests | universalist |
| Domain algebra, operations/laws, denotational/combinator model, event/workflow algebra, interpreters/law tests | algebra-driven-design |
| Source-of-truth, owned invariant, impossible state, validation sprawl, idempotency/retry/order drift, witness parity, enforcement boundary | invariant-ace |
| Decision-complete implementation spec, readiness gate, spec lint, invariant challenge before planning | spec-pipeline |
| Dependency-ordered execution campaign, rollout/rollback sequencing, task waves | plan / st / mesh |
When parse is paired with another skill, parse still owns the architecture
fingerprint. Run the collector-first classification pass, emit the parse memo,
and let the companion skill consume the handoff packet. Do not let companion
source exploration swallow the explicit classification step.
Resolve these from the prompt and repository context:
| Input | Default | Notes |
|---|---:|---|
| repo_path | current repository | Required root path for inspection. |
| focus_paths | none | Repo-relative files/directories when the user names a slice or downstream change target. |
| mode | standard | quick, standard, or deep. |
| drift_check | false | True when the user asks whether docs match implementation. |
quick: repo kind, dominant architecture, confidence, runner-up line,
decisive evidence, repo-fit hints, and caveats. Use for repo-dialect preflight
or fast orientation.standard: full output contract. Default.deep: full output contract plus mandatory focused rerun on thin/mixed
repo-wide signals, and 3-5 direct file:line citations when source inspection
is available.parse is a CLI-backed collector-first workflow. A normal run starts with
parse-arch collect.
parse-arch collect /path/to/repo --json
parse-arch collect /path/to/repo --focus-path src --focus-path tests --json
parse-arch collect --repo-path /path/to/repo --json
parse-arch collect --repo /path/to/repo --format json
The collector always emits JSON. It should expose these read-depth fields on repo-wide runs:
parse-arch collect "$PWD" --json |
jq -e 'has("read_depth_verdict") and has("thin_signal_classes") and has("suggested_focus_paths") and has("followup_hint")'
Treat read_depth_verdict: thin_repo_wide as a required second-pass trigger, not
as a warning you can ignore.
If the collector contract check fails, stop the normal parse path and install or build a compatible collector before finalizing if possible:
brew install tkersey/tap/parse-arch
brew upgrade tkersey/tap/parse-arch
# or, for local iteration:
(cd "$HOME/workspace/tk/skills-zig" && zig build build-parse-arch -Doptimize=ReleaseFast)
If the collector is unavailable and no install/build path is allowed, continue only after naming the exact failed command and the missing signal classes. The manual fallback must be conservative and must not pretend to have collector coverage.
Follow this order. Do not reorder it unless the collector is unavailable and you explicitly record that fallback.
Classify the repository shape first:
application-serviceweb-frontendfull-stack-applibrary-sdkcli-toolingmonorepo-platforminfra-opsdata-pipelineplugin-extensionmixed/otherRepo kind prevents app-centric architecture labels from being forced onto thin libraries, tooling repos, generated packages, infrastructure repos, and data pipelines.
Run the collector before manual source inspection.
parse-arch collect <repo_path> --json for repo-wide classification.--focus-path per target path when focus_paths are available.Evidence section.read_depth_verdict,
thin_signal_classes, suggested_focus_paths, and followup_hint.Do not narrate a manual source-reading pass before the collector result exists. Manual inspection is the escalation step after collector evidence, not a replacement for it.
Immediately run a focused collector pass when any of these is true:
read_depth_verdict == "thin_repo_wide";focus_paths and they may differ from the repo-wide story.Use the collector's suggested_focus_paths when available.
If suggestions are missing, choose 2-4 architecture-defining paths yourself:
If focused collection still leaves the architecture under-determined, perform one named trace. Keep it narrow.
entrypoint -> flow -> boundaries: for application/service, full-stack,
monorepo/platform, and plugin/extension repos. Start at main, route tables,
command registries, job/workflow definitions, or host/plugin registries. Follow
one representative request/job/command into orchestration/core modules. End at
storage, network, queue, file, generated, or adapter boundaries.public contract -> examples/tests -> core: for library/SDK and CLI/tooling
repos. Start at exported API roots, command registries, manifests, or
provider/plugin interfaces. Follow examples, golden tests, or integration tests
that define the contract. End at implementation modules that enforce the seam.job spec -> stage graph -> sinks: for data/pipeline and infra/ops repos.
Start at DAGs, workflow manifests, build/deploy entrypoints, or task
registries. Trace stage ordering and handoff seams. End at sinks, state stores,
providers, deploy targets, or execution adapters.Use safe, non-mutating probes only when they add meaningful evidence: local help, build/test discovery, dependency inspection, or read-only scripts. Stop if the only useful probe mutates tracked files, requires secrets, or depends on network-only truth.
Read references/taxonomy.md before final classification when available.
Pick one dominant architecture label from the curated taxonomy. Prefer common labels plus explicit hybrid wording over invented niche names.
Do not lock the label from one cue. Confirm the dominant label across at least three distinct evidence surfaces when possible:
Name the strongest runner-up label or coexisting pattern and the decisive evidence that kept it secondary.
Capture at most two coexisting patterns, and only when they materially shape seams, contracts, or control flow.
For each pattern, use one scope:
repo-wide modifier: recurs across at least three slices or across two
independent seam types, such as package layout plus runtime registry or
examples/tests plus generated boundary.slice-local variant: real in the named slice but not architecture-defining
for the whole repo.near-miss: plausible but missing decisive evidence or subordinate to the
dominant label.Do not claim specialized patterns such as CQRS, event sourcing, DDD, anti-corruption layers, workflow engines, or hexagonal architecture without direct repo evidence at the appropriate seams.
Translate the dominant architecture, coexisting patterns, subsystem variants, and focus-path deltas into narrow implementation-fit hints:
Do not recommend migrations, modernization, layer removal, new abstractions, invariant enforcement, or follow-up skills unless the user explicitly asks for next steps.
Use high, medium, or low.
high requires:
medium applies when:
low applies when:
Low confidence still requires choosing one best-fit label, but all repo-fit hints
must be advisory and include do_not_assume warnings.
Return these sections in order for standard and deep modes.
Repo KindDominant ArchitectureConfidenceWhy This Best FitsMajor Subsystems / Coexisting PatternsRepo-Fit AdviceAgent HandoffEvidenceArchitecture DriftCaveatsName the repo shape and why it matters for interpretation. Do not force service labels onto libraries, tooling, infra, data, plugin, or generated-artifact repos.
Give one best-fit label from the curated taxonomy. Use hybrid wording only when both halves are evidenced and one remains dominant.
Use high, medium, or low. Explain what evidence would raise or lower the
score.
Start with exactly this line:
Runner-Up: <label or pattern> — <decisive reason it lost>
Then explain the dominant label using the strongest code-first evidence. Support
load-bearing claims with 3-5 decisive file:line citations when direct source
inspection was possible. When a claim is collector-derived, cite concrete paths,
module names, signal summaries, or collector fields and say that it is
collector-derived.
When evidence is mixed, say why the nearest competing label or pattern stayed secondary.
List major subsystem variants only when they materially differ from the dominant architecture.
List 0-2 directly evidenced coexisting patterns. For each:
pattern;scope: repo-wide modifier, slice-local variant, or near-miss;evidence;why_secondary.If there are no material coexisting patterns, write none material.
Give 3-5 bullets that help a downstream agent fit work to the repo as it exists now. Keep them descriptive and current-state-only.
Include do_not_assume warnings when confidence is weak or a focus slice differs
from the repo-wide story.
Emit one fenced YAML block with exactly these stable top-level keys:
repo_kind:
dominant_architecture:
confidence:
focus_scope:
runner_up:
major_subsystems:
coexisting_patterns:
architecture_drift:
repo_fit_hints:
do_not_assume:
evidence_paths:
collector_commands:
missing_signal_classes:
classification_only: true
Keep values concise and parseable. Do not include implementation tasks or migration recommendations.
Separate decisive evidence from supporting evidence.
Required fields:
collector_commands: exact command shapes used;read_depth: repo-wide/focused verdicts and any thin signal classes;decisive_evidence: at least three distinct evidence surfaces when available;supporting_evidence: docs, tests, config, package layout, generated artifacts,
or framework clues that support but do not decide the label;focus_delta: how focused rerun changed or did not change the repo-wide read.Prefer concrete paths, file:line citations, module names, entrypoints, signal summaries, and exact collector fields over general impressions.
Compare docs and implementation when both exist.
Use one of:
none observed;docs overstate architecture;docs understate architecture;docs stale or contradictory;not assessed with reason.Keep drift critique lightweight. Do not prescribe a new target architecture.
State uncertainty, missing evidence, and overclaim boundaries. Tie each caveat to specific missing signals and the compensating paths inspected. Name plausible but unproven competing labels or patterns explicitly.
Avoid generic version-centric caveats unless the collector binary behavior itself is the issue.
For quick, return only:
## Repo Kind
## Dominant Architecture
## Confidence
## Runner-Up
## Decisive Evidence
## Repo-Fit Hints
## Caveats
Still run the collector first unless unavailable.
parse-arch collect is the normal first move, not optional decoration.parse as freeform source-reading.high without at least three distinct evidence surfaces.repo-wide modifier from one interesting file or one slice.codebase-archaeology, reduce, universalist,
algebra-driven-design, invariant-ace, spec-pipeline, or plan absorb the
parse memo when parse is active.Use these as hints, not proof.
layered / n-tier: controllers, handlers, services, repositories, models,
or adapters arranged in dependency order.mvc / mvvm / component-driven UI: explicit presentation-model/controller
boundaries in UI-heavy repos.clean / hexagonal / onion / ports-and-adapters: application/domain core
protected from delivery/infrastructure adapters by explicit ports, interfaces,
dependency inversion, or adapter registries.modular monolith: one deployable codebase with meaningful internal module
boundaries and local calls rather than network/service decomposition.microservice / service-oriented: multiple independently shaped services with
network, message, deploy, or ownership boundaries.event-driven: explicit publishers, consumers, brokers, async event flows, or
event contracts dominate control flow.pipeline / job-oriented: DAGs, staged jobs, workflows, ETL phases, scheduled
tasks, or artifact handoffs dominate the system.plugin / extension-based: hosts, hooks, plugins, extensions, providers,
capability registries, or adapter registries are first-class architecture
surfaces.library-sdk: exported API roots, public contracts, examples/tests, generated
clients, and provider seams matter more than app entrypoints.cli-tooling: command registries, staged passes, file/artifact boundaries,
stdout/stderr contracts, and golden tests matter more than web/service labels.infra-ops: desired-state config, deploy graphs, provider modules, CI/CD,
policy gates, and environment overlays dominate the architecture.data-pipeline: job specs, stage graphs, schemas, sinks, lineage, and scheduled
execution dominate the architecture.Treat thin_repo_wide as an under-read warning even when a top score is non-zero.
The second pass is about deepening evidence, not rewording the same thin read.
Use $parse to classify this repo's current architecture.
Return the full parse memo. Run parse-arch collect first, rerun focused collection
if repo-wide signals are thin or mixed, choose one dominant label, name the
runner-up, call out coexisting patterns only when directly evidenced, and keep
repo-fit advice current-state-only.
Use $parse for this repo, focused on:
- <path>
- <path>
Keep one repo-wide dominant label if warranted, but say explicitly whether these
paths follow the repo-wide architecture or a material slice-local variant.
Use $parse to classify the implementation architecture and compare it with the
docs' architecture claims. Keep the result current-state-only; do not prescribe a
migration.
Use $parse quick mode as a repo-dialect preflight for a small downstream change.
Return the dominant architecture, confidence, runner-up, decisive evidence, and
3-5 repo-fit hints. Do not expand into onboarding or redesign advice.
Run these when editing this skill or the collector contract:
uv run --with pyyaml -- python3 codex/skills/.system/skill-creator/scripts/quick_validate.py codex/skills/parse
parse-arch collect "$PWD" --focus-path codex/skills/parse/SKILL.md --json | jq -e '.read_depth_verdict == "focused"'
parse-arch collect "$HOME/workspace/tk/shift" --json | jq -e 'has("read_depth_verdict") and has("suggested_focus_paths") and has("thin_signal_classes") and has("followup_hint")'
parse-arch collect "$HOME/.dotfiles" --json | jq -e '.read_depth_verdict == "thin_repo_wide" and (.suggested_focus_paths | length > 0)'
parse-arch eval --suite "$HOME/workspace/tk/skills-zig/apps/parse-arch/references/eval/suite.yaml"
parse-arch doctor --suite "$HOME/workspace/tk/skills-zig/apps/parse-arch/references/eval/suite.yaml" --repo-path "$PWD"
references/taxonomy.mdreferences/evidence-playbook.md$HOME/workspace/tk/skills-zig/apps/parse-archparse-arch collecttesting
Use before local patching when bugs, regressions, malformed state, crashes, parser failures, migrations, cache drift, protocol problems, compatibility requests, tolerant readers, fallbacks, coercions, retries, catch-and-continue logic, or local workarounds may broaden accepted invalid state.
testing
Use for bug reports, PR/issue prose, reviewer comments, user diagnoses, generated summaries, memories, retrieved context, public tracker context, claimed root causes, proposed fixes, fake-minimal repro risk, or any investigation where natural-language context could anchor the implementation scope.
development
Use when non-trivial work needs Challenge Escalation, latent-intelligence activation, frame-market selection, doctrine operators, dominant-move selection, ablation/surface-tax judgment, reification, review comment law, negative capability, route receipts, or proof-bearing refusal to mutate.
development
Apply Algebra-Driven Design. Use for ADD, denotational design, combinator models, law-driven architecture, domain algebra, property tests, codebase modeling, event sourcing, workflow design, or agentic skill design. If the canonical bundle is unavailable, use this wrapper as the minimal ADD kernel and report the missing bundle path.