plugins/promode/skills/discovery-to-determinism/SKILL.md
Put the bulk of acceptance coverage below the UI through a fast, deterministic headless client driving an operator seam, and reserve a surgical UI state-graph tier for defects that only manifest through the real GUI. Use when designing test/QA or acceptance-testing strategy, automating acceptance, end-to-end (E2E), or QA testing of a running app, deciding what to cover with fast headless tests vs slow UI/E2E, building agent-driven exploration or automation of a running app, building a below-UI operator seam (interaction layer) or headless client, or crystallising agent-discovered knowledge into reusable deterministic artifacts (maps, graphs, scripts, tests). Covers the Discovery⇄Determinism flywheel, the operator-seam architecture (one seam serving both a headless test client and AI-agent tools), and layered headless-first acceptance testing with a surgical UI state-graph tier for GUI-only defects.
npx skillsauth add mikekelly/promode discovery-to-determinismInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill teaches how to run that asymmetry as a flywheel (discovery hardens into determinism; determinism makes the next discovery cheaper and sharper), and how to cash it out architecturally: by carving a clean operator seam below the UI so most behaviour is driven by fast deterministic code instead of the slow, flaky GUI — a seam that, built well, serves both a headless test client and AI-agent tools, because tests and agents are both non-human operators needing scriptable access to the real logic.
The principle leads; testing is its first worked embodiment. Keep it general; the worked example illustrates, it does not define. </objective>
<quick_start>
<when-this-applies>) — and STOP if it fails (the UI is the logic, a programmatic seam already exists, the app is too small, or no failing test needs the seam yet).references/ui-state-graph-edt.md) only for defects that only surface through the real GUI; never let it re-test what headless covers.Arrow 1 — Discovery → Determinism (crystallise). An agent explores the unknown; the finding is hardened immediately into deterministic, self-checking code — a map, a script, a graph, a test, a recognizer. The artifact is a cache of discovered structure. A discovery left as prose or a transcript is a finding you will pay full price to discover again.
Arrow 2 — Determinism → Discovery (bootstrap & target). ← the half people miss. The crystallised artifacts make the next discovery cheaper, safer, and self-targeting:
The ratchet: discovery produces determinism; determinism lowers the cost and raises the precision of the next discovery; repeat. This is a model and a discipline, not a measured law — there is no promised "cost falls by N%," and a stale crystallised map inverts the benefit until re-crystallised.
Promode already runs this flywheel once. The CLAUDE.md-rooted agent-knowledge graph is its first instance: a knowledge node is a crystallised discovery about the repo; "orient before you act" is the where-am-I check. Everything below aims the same loop at a running app instead of a codebase.
</the-flywheel>
When a crystallised artifact fails, it has asked a question. Triage it — only inference can, because code cannot know intent:
This triage is the feedback channel, and it is why coverage compounds instead of rotting: every failure either hardens the suite (flake → more determinism), advances it (change → re-crystallise), or protects the system (regression → alarm). The fail-fast, localised-failure requirement exists to make the triage cheap — a precise break tells inference where to look and often which of the three it is; a vague "it went red" forces a fresh investigation every time, and a suite that fails imprecisely cannot drive its own repair. </closing-the-loop>
<disciplines> The methodology is not "use a graph." It is the set of disciplines that keep the loop turning:<closing-the-loop>). Verify the property by perturbation (deliberately break one check; confirm it halts exactly there and reports precisely).One seam, two non-human operators. The same seam that lets a fast headless client drive end-to-end acceptance tests is structurally the seam that exposes the system to AI-agent tools — because a test runner and an agent both need a clean, observable, scriptable grip on the real logic with the GUI stripped away. Designing for headless testability and designing for agent-operability converge on one architectural investment that pays out twice. This convergence is the load-bearing reason the seam is worth building even before any agent tool exists.
But it is convergence of the SEAM, not identity of the INTERFACE. Fence it, or it becomes wrong and dangerous:
axi skill's domain; defer to it, don't re-derive it), and failure semantics (deterministic fail-fast vs tolerant/recoverable/idempotent).Build the seam by test-driven extraction, never speculative architecture. Under RED→GREEN→REFACTOR and KISS, the seam is the thinnest interface a failing test needs to reach below-UI logic otherwise only reachable through the slow GUI — no wider than the test demands. Prefer exposing or cleaning an existing programmatic seam (public API, application/service layer, CLI, SDK, MCP) over inventing a parallel one that drifts. Design its observe()/act() shape so it could serve an agent later, but ship only what the test needs now. See references/operator-seam-and-agent-tools.md.
</the-operator-seam>
Keep the two axes separate:
<the-operator-seam>).They compose: the same scenario could in principle run against the headless seam, and a surgical few against the UI tier — the scenario doesn't change, only the runner beneath it does. Don't conflate "writing the scenario" with "building the seam"; a scenario with no seam is just prose, and a seam with no scenario has nothing to assert.
Tool-agnostic, KISS — do NOT mandate a BDD framework. Gherkin/BDD (Cucumber, behave, SpecFlow, etc.) is one option for expressing scenarios, not a requirement. A plain high-level test function whose name and steps read as a user story serves the same purpose. Reach for a Gherkin/step-definition framework only when a non-technical stakeholder actually reads or authors the .feature files — otherwise the step-definition indirection layer (glue code mapping prose to calls, kept in sync by hand) is pure maintenance cost for no readability gain. The principle is "the acceptance test reads as the evidence-based user story and traces up to the need," achievable with or without Gherkin. Pick the lightest expression that keeps the trace legible.
</scenario-vs-seam>
The load-bearing rule: the UI tier must not re-test what the headless tier already covers — or could. A slow UI test doing work a fast headless test could do is the central anti-pattern this methodology exists to prevent; in review it is a defect (slow, flaky, redundant), not a style nit.
Same {arrange, act, assert} shape on both sides. Most criteria are verified headless; only the inherently-visual/interactive few are promoted to the UI graph, where the graph makes arrange (reach the precondition) nearly free.
The Tier-2 mechanics — modelling the app as a state graph and Explore→Distill→Traverse with the fail-fast contract — live in references/ui-state-graph-edt.md.
</layered-acceptance-testing>
So: do NOT extract a reusable graph/recognizer/traverser library or a standard adapter interface from this single case — that is the premature-generalisation trap. Write and reuse the methodology freely; keep any shared skeleton deferred until a second app or surface (ideally a second client adapter, e.g. web — ideally one that actually ships agent tools) has exercised it. Validate at n≥2 before relying on a shared abstraction. </the-honest-caveat>
<references> - `references/ui-state-graph-edt.md` — Tier-2 mechanics: the state-graph model, Explore→Distill→Traverse, recognizers, the fail-fast contract and its lifecycle, the client-adapter seam, the hard-won implementation rules, and the worked example. - `references/operator-seam-and-agent-tools.md` — the operator-seam ↔ agent-tool convergence in depth: where it holds, the four divergence axes, the privilege/security fence, and how to shape `observe()`/`act()` so the seam *could* serve an agent without ever becoming one by accident. </references><success_criteria> You have applied this skill well when:
development
Establish a design source-of-truth (a DESIGN.md-style two-layer doc of tokens + rationale), build a lookbook that renders it, and wire a live-refresh preview server so visual work gets a fast edit→see feedback loop. Use when setting up or restructuring a design system / design tokens, creating a DESIGN.md or design source-of-truth, building a lookbook, or wanting live preview / live reload of design or marketing artifacts — landing pages, decks, one-pagers, marketing material previews. The visual analogue of the operator-seam test loop: it crystallises taste into determinism. Defers aesthetic taste (typography/color/motion choices) to the frontend-design skill.
development
Walk the knowledge graph (rooted at CLAUDE.md) and ensure every crucial design constraint — invariants, prohibitions, required patterns, load-bearing decisions — is stated inline in the nearest loaded CLAUDE.md orientation that governs the affected area, with a link to its full rationale for expanded discovery. Use when constraints are buried in ADRs, knowledge docs, code comments, or tribal knowledge; when an agent violated a rule it had no way to know; or when asked to surface, hoist, lift, strengthen, or reinforce design constraints, or make them discoverable to agents.
development
Audit how well a repository's codebase and practices align with the promode methodology, then produce a prioritised, actionable improvement plan. Fans out parallel assessors (one per dimension) and synthesises their findings. Use when the user wants to assess promode alignment/fit, audit a repo against the methodology, or get a plan to bring a codebase in line with promode. Also flags stale per-project install leftovers — promode ships its own SessionStart hook, so nothing should be copied into a project.
testing
Write a handoff document so a fresh agent can continue this work after the conversation ends. Invoke when the user is about to /clear or /compact, when context is filling up, or when the user asks to hand off, checkpoint, or pause work for a later session. Argument (optional): what the next session will focus on.