look-before-you-leap/skills/engineering-discipline/SKILL.md
Use for every task that writes, edits, fixes, refactors, ports, migrates, or debugs code — any language, any framework, any project size. This skill enforces the habits that prevent broken builds, missed consumers, and silent scope cuts: read imports and consumers before editing, track blast radius on shared types and utilities, never use `any`/`as any` type shortcuts, run type checkers/linters/tests after every change, and explicitly flag anything you skip. Applies to bug fixes, feature additions, refactors, dependency bumps, config changes, CI fixes, webhook handlers, form validation, migration scripts, and environment setup. Even one-file fixes get the verification step. Do NOT use for pure questions, explanations, research, documentation, code reading, PR reviews, or conversations that don't modify source files.
npx skillsauth add miospotdevteam/claude-control engineering-disciplineInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill shapes HOW you approach engineering work. It doesn't teach you a language or framework — it prevents the class of mistakes that come from moving too fast: silent scope cuts, broken imports from unchecked blast radius, type safety holes, unverified changes, and abandoned plans.
The core principle: every shortcut you take now becomes a bug someone else finds later. The few extra minutes spent exploring, checking, and verifying are worth it every single time.
This skill overrides your instinct to move fast. When you feel the urge to skip a check, drop a type, or trim scope to unblock yourself — that is exactly the moment this skill matters most.
Read this skill as an operating contract, not a style guide.
You are not allowed to reinterpret a hard rule as optional because:
Confidence is not an override. Speed is not an override. Good intentions are not an override. If you need an exception, surface the conflict explicitly instead of quietly acting as though the rule did not exist.
Before editing any file, build a mental map of the change. This is the single highest-leverage habit — most mistakes happen because you understood the file but not its context.
When you open a file to change it, also read:
# Primary method (TypeScript projects with dep maps configured):
python3 ${CLAUDE_PLUGIN_ROOT}/scripts/deps-query.py <project_root> "<file_path>"
# Fallback (no dep maps, or non-TypeScript):
# Grep for import statements referencing this file
A hook enforces deps-query.py when dep maps are configured — if you
grep for imports and get blocked, use deps-query.py instead.Before implementing something, search the codebase for prior art:
If you find an existing utility or pattern, use it. Reimplementing something that already exists creates divergence and maintenance burden.
Memory (the auto-memory system) should only store procedural rules — how to work. Never store declarative facts about code — what the code looks like, what fields an API returns, how many locales a project has, what a type contains.
Facts go stale. Code changes between sessions. If you memorize "the API
returns uploadedKey" and the field gets renamed next week, your memory
becomes a source of bugs instead of help.
DO store (procedural):
DO NOT store (declarative):
{ settings, uploadedKey }"slugStatus field"slugStatus === 'available'"CLAUDE.md per project is for current project state, maintained by reading the codebase. Memory is for behavioral rules that transcend any single codebase state.
If the user asked for 5 things, all 5 must be addressed. If one is blocked or too complex, you MUST say so explicitly:
"I completed items 1-4. Item 5 (webhook retry logic) is blocked because the queue system doesn't expose a retry API. Here's what I'd suggest instead: ..."
What you must NEVER do:
If you catch yourself thinking "I'll skip this for now," stop. Either do it or explicitly flag it. Silently trimming scope is the single worst thing you can do because the user has no way to know what's missing until it breaks.
These patterns exist to make the compiler stop complaining. They trade compile-time safety for runtime crashes. Never use them:
any or as any in TypeScriptv.any() in Valibot/Convex/Zod/validation schemasuserId on an authenticated route — if the route requires auth, the user
ID is ALWAYS present)any or missing return types on public APIs// @ts-ignore or // @ts-expect-error without a detailed explanation
of why it's necessary and what the actual type should bestring when the actual type is a specific set
of valuesIf proper typing is hard, that's a signal the design needs thought — not
that you should skip types. Take the time to figure out the correct type.
If a third-party library has bad types, write a thin typed wrapper rather
than spreading any through the codebase.
Exception for inferred types: In frameworks that infer types (Convex, tRPC, Drizzle), don't add redundant return-type annotations — let the framework's inference do its job. The rule is about safety, not ceremony.
When handling external input that maps to an enum or closed set, reject unknown values with an explicit error. Never silently coerce an invalid value to a default.
// CORRECT — reject unknown values
if (!validPlatforms.includes(input)) return error(400, "Invalid platform");
// WRONG — silent coercion hides contract drift
const platform = validPlatforms.includes(input) ? input : "both";
The coercion pattern looks helpful but makes debugging impossible: callers learn that any string is accepted, the real contract drifts from the documented one, and when the valid set changes, no error surfaces. This applies equally to mandatory protocol requirements — if a rule says "no exceptions," do not add a skip clause. A "mandatory" step with an opt-out path is the protocol equivalent of coercing an invalid enum to a default.
When you modify any of these, you MUST check all consumers:
For cross-repo contracts (webhooks, shared types consumed by other services),
you cannot rely on local grep or dep maps. Read the receiver's handler in the
other repo before changing the producer's payload. If you rename a field from
presenzaId to id in a webhook payload, the receiver still reads
presenzaId — silent exit, broken sync.
The check process:
<!-- deps-consumer-blast-start -->python3 ${CLAUDE_PLUGIN_ROOT}/scripts/deps-query.py <project_root> "<file_path>"
This shows every file that imports the one you're changing, across
all modules. A hook enforces this when dep maps are configured.If you change a shared dependency version, this is especially critical. A single version bump can cascade through the entire project. Check lock files, peer dependencies, and framework compatibility before committing to the bump.
If the task involves renaming across files, moving files/modules, extracting
code into new modules, splitting files, restructuring directories, or
changing naming conventions across the codebase — invoke
look-before-you-leap:refactoring. Its contract-based approach
systematically catalogs every target, consumer, and test before changes
begin, catching the missed consumers and dead code that make incomplete
refactoring Claude's #1 failure mode.
The refactoring skill applies when changes cross file boundaries. Single-file cleanup (renaming a variable within one function, simplifying conditionals) is handled by engineering-discipline directly — no skill invocation needed.
If dep maps are configured, the refactoring skill uses deps-query.py to
find all consumers instantly. After the refactoring, it regenerates stale
dep maps so future queries reflect the new structure.
If the current step has skill: "look-before-you-leap:test-driven-development"
in plan.json, or its progress items follow the TDD rhythm (Cycle N RED,
Cycle N GREEN, Refactor), invoke the TDD skill and follow its
red-green-refactor cycle mechanically:
Do NOT write implementation before tests. Do NOT write all tests at once then implement. Each cycle is one behavior slice — test it, implement it, move to the next.
If you find yourself writing implementation code before the corresponding RED progress item is done, STOP — you're violating TDD. Go back and write the test first.
Error handling that hides failures is worse than no error handling at all. Never write these patterns:
.catch(() => {}) or .catch(() => null) — swallows the error and
continues as if nothing happened. If the operation matters enough to
call, its failure matters enough to handle.try/catch around multi-step operations that resolves
successfully on failure — a single failed step aborts the rest, but
the caller never knows. Catch at the narrowest scope and either
rethrow, log meaningfully, or degrade explicitly.The pattern to watch for: code that looks like it handles errors but
actually just makes them invisible. A .catch that returns a fallback
the consumer can't distinguish from success is not error handling — it's
error hiding.
What to do instead:
When you add new behavior — an API endpoint, webhook handler, aggregation function, contract change, or any logic that produces observable output — it MUST land with:
project-structure/api.md, OpenAPI spec, route
registry), update it in the same step. This is not a follow-up task —
it's part of adding the behavior.Do NOT defer either of these. "I'll add tests later" means "these tests will never exist." The plan step that adds the behavior must include both the test and the doc update as progress items.
When Codex flags MISSING_TEST, treat it as equal priority to code bugs. Do not fix a code finding and ignore the test finding in the same reverify cycle. When Codex flags both code issues and MISSING_TEST in the same reverify round, you MUST address BOTH before re-verifying. Fixing code findings while ignoring test findings is the #1 test-debt pattern. A test gap flagged twice across verification rounds is a pattern failure — it means you are systematically deprioritizing test coverage.
When adding user-visible strings, locale files must be updated in the same step — not deferred. This applies to every string a user can see: labels, placeholders, error messages, tooltips, section headers, accessibility text.
The check:
t() / formatMessage / translation calls in your changest("key", "English text")) count as missing
— they bypass the translation pipeline entirelyaccessibilityLabel= "Close") also count as missing — they silently ship untranslated stringsThe plan step that adds user-visible strings must include locale files in
its files array. A step that adds UI copy without touching locale files
is incomplete by definition.
If you add a new import, verify the package exists in the project:
package.json (or Cargo.toml, pyproject.toml, go.mod, etc.).env or the framework's configDo NOT assume packages are installed. Do NOT assume env vars are loaded. Do NOT use a tool without checking it exists. These are the most common sources of "it works in my head but not on the machine" failures.
When writing documentation, examples, or instructions that reference CLI
flags, function signatures, or behavior: read the actual --help output,
function declaration, or source code. Never document from assumption.
The check:
<tool> --help or read the sourceupdate_step_status() vs update_step())Wrong documentation is worse than no documentation — it teaches incorrect patterns that propagate. Documenting a flag that doesn't exist or was removed is not a typo — it's a behavioral contract violation that misleads every reader.
Before writing any typed API call — a fetch wrapper, a hook that reads a response, a client method that destructures a result — grep for the endpoint path and read the return statement in the handler. Never guess what an API returns based on what you think it should return.
The check:
Grep for the endpoint path (e.g., /api/settings)This prevents the class of bug where you type a response as { heroImage }
but the API actually returns { settings, uploadedKey }. These bugs are
invisible in the UI until the user saves and loses data.
When a function requires a real system value (email, user ID, API key, resource identifier), trace to where that value actually lives in the system and use it. Never construct a plausible-looking value from other fields.
// CORRECT — reads the actual value
const email = user.email;
// WRONG — fabricated, breaks with spaces, unicode, long names
const email = `${user.fullName}@company.com`;
Fabricated values pass type checking and look correct in code review. They
break at runtime — an email constructed from a full name with spaces
(Mario [email protected]) is invalid, blocks onboarding flows, and routes
communications to nonexistent addresses.
The check: when you write a value for a field that will be sent to an external system (Stripe, email provider, auth service), ask: "Did I READ this value from a data source, or did I CONSTRUCT it?" If constructed, find the real source.
When a new component, hook, or handler reimplements behavior that already exists elsewhere in the codebase (e.g., a new page that replaces an old one, a unified flow that wraps existing operations), read the source you're replacing and verify parity:
The pattern to watch for: you read the old code during exploration, then
write the new code from memory hours later. Memory drifts. The old code
uses slugStatus === "available" as a gate; you write slug.length > 0.
The old code strips leading/trailing hyphens from slugs; you only strip
invalid characters. Always diff — never rely on recall.
When a step says "mirror", "parity", "reuse", or references another screen, component, or flow as a model, build a parity table before coding — not a prose comparison, a structured diff:
| Dimension | Reference file(s) | Must match | Allowed deviation |
|---|---|---|---|
| Data source | <file:line> | query, params, filters | — |
| Behaviors | <file:line> | CRUD ops, validation | — |
| Disabled/error/loading states | <file:line> | guards, fallbacks | — |
| Labels / i18n keys | <file:line> | key names, fallbacks | — |
| Preconditions | <file:line> | auth, status gates | — |
Fill in every row before writing code. After implementation, walk the table row by row and verify each "must match" cell is satisfied. If you intentionally deviate, fill the "Allowed deviation" column with the reason before verification — not after.
This replaces the prose-based parity check. A table is auditable; prose is not. If you cannot fill the table, you haven't read the reference deeply enough.
After writing any UI with mutable state (forms, inline editing, settings panels), verify that every editable field has a complete path from UI change → state update → API mutation → persistence:
This is the highest-damage class of UI bug because it's invisible during development. The UI looks correct. The state updates. But the mutation never fires, so the next page load shows the old data. Check every field, every time.
When building a unified flow that calls existing operations (e.g., a "publish all" button that triggers individual publish operations), read each operation's handler and replicate its precondition checks:
slugStatus === "available", your
wrapper must check the same — not a weaker versionThe failure mode: you build a unified publish flow that only checks
isPro && !isDraft && slug.length > 0, but the existing publish handler
gates on slugStatus === "available" (which requires async validation).
Your flow lets users publish with invalid slugs because you invented
simpler preconditions instead of reading the real ones.
Before writing code for any step, extract every deliverable from the step's description and acceptance criteria into a numbered checklist. Write it down — in the plan notes, in a comment, anywhere persistent. Then verify each item before marking the step done.
The process:
This prevents the failure mode where you focus on the primary feature and forget secondary deliverables. Example: step description says "Tab label adapts to vertical (Menu vs Lookbook)" — you implement the tab content but forget the label adaptation because you focused on the harder part.
After implementing any UI component, API handler, or conditional logic, systematically ask: "What if this data is null? Empty array? Error response? Single item instead of many?"
For every conditional render path ({data && ...}, data?.length > 1,
if (result)), verify the output when the guard fails — not just when
it succeeds. If the answer is "nothing renders" and the acceptance criteria
expect something visible, you have a bug.
The check:
This prevents the class of bug where a chart silently disappears when data
is null, a detail page shows a permanent loading skeleton on API error, or
a form sends undefined instead of null for cleared fields.
After implementing any UI with async data loading or multi-entity selection (tabs, lists, pickers where switching items triggers new fetches), fill in this matrix before marking the step done:
| Transition | What happens | Handled? | |---|---|---| | Switch item while request in flight | Cancel/ignore stale? | ✓/✗ (how) | | New request fails | Error state shown? | ✓/✗ (how) | | Close and reopen the view | State reset or stale? | ✓/✗ (how) | | Selection changes during load | Request scoped to selection? | ✓/✗ (how) | | Stale response arrives late | Ignored or overwrites current? | ✓/✗ (how) | | Cosmetic default vs persisted | Default in form state or visual only? | ✓/✗ (how) |
For each row: write "handled (mechanism)" or "not applicable (reason)". Empty cells are bugs. A cosmetic default that is not in form state causes data loss on save — the user sees a value, saves, and it disappears.
This matrix catches the class of bug where switching between entities shows stale data from the previous selection, or where a late-arriving response overwrites the current view.
Not every blocker requires stopping. Use these rules to decide:
When in doubt, stop and ask. A 30-second confirmation is cheaper than an unwanted destructive change.
If you modified shared code (types, utilities, API signatures), re-check consumers AFTER your changes are complete — not just before. The pre-change check in Phase 1 tells you who to update; this post-change check confirms you didn't break them.
# Re-run deps-query on every file you modified that has downstream consumers:
python3 ${CLAUDE_PLUGIN_ROOT}/scripts/deps-query.py <project_root> "<modified_file>"
For each consumer found, verify it still compiles and behaves correctly with your changes. If you added a new export (e.g., a new error class), confirm it's exported from the package's index file so consumers can actually import it.
This step catches the most common class of post-change breakage: you updated the source file but missed a consumer, or you added something consumers need but forgot to export it.
After making changes, run the project's verification tools. Check
${CLAUDE_PLUGIN_ROOT}/skills/look-before-you-leap/references/verification-commands.md
for framework-specific commands, but the general approach is:
tsc --noEmit, mypy, cargo check, etc.eslint, ruff, clippy, etc.bunx knip --reporter compact
to verify no orphaned code remainsHow to find the right commands: Check package.json scripts,
Makefile, Cargo.toml, pyproject.toml, or CLAUDE.md / README.md
for the project's standard commands. Use whatever the project already uses
rather than guessing generic commands.
If any verification step fails, invoke
look-before-you-leap:systematic-debugging to investigate the root cause.
Do not guess at fixes or stack speculative changes. The debugging skill's
four-phase process (investigate → analyze → hypothesize → implement)
prevents the thrashing that comes from random fix attempts.
For heuristic, layout, timing, and threshold bugs, changing a constant is NOT evidence of understanding. Do not bump margins, delays, widths, retry counts, or safety factors until you have recorded:
This step is not optional. It is not something you do when asked. It is something you do EVERY TIME, automatically, as the final step of every task.
If the acceptance criteria say "tsc passes" and tsc does not pass, fix the issue — regardless of whether this step introduced the failure or it existed before. "Pre-existing" is not a valid dismissal. Either fix the failure or get the acceptance criteria changed before the plan was approved.
This applies to all verification: type checker errors, lint failures, test failures, and Codex findings. If the criteria require it to pass and it doesn't, that's a finding — full stop.
Before declaring a step done, if your work produces values that must conform to a closed set — enum values, schema fields, function signatures, API contracts, mode/status strings — re-read the source definition and verify your values match character-for-character.
Do NOT verify from memory. Open the file that defines the enum, schema, or
contract. Compare your output against it. If you wrote "dynamic" and the
enum only allows 5 specific values, you have a bug — even if "dynamic"
seems like a reasonable value.
Common closed sets to verify against:
ToolSearch to read the schema before calling. For
internal functions, read the declaration.references/foo.md is wrong if the consumer lives in a different
subdirectory than the references folder.When the user points out a mistake, do not just fix that one instance. Immediately search for the same class of mistake elsewhere in your changes:
any somewhere — grep for other any you addedThis self-audit is automatic after any correction. Fix the pattern, not individual instances.
Before saying a task is done:
If ANY requirement is unaddressed or ANY plan step is incomplete, you are NOT done. Go finish it, or explicitly flag what's remaining and why.
Before declaring a task done, every item must be checked:
### Criterion: template — each acceptance criterion mapped to file:line evidenceWhen summarizing your work, include:
A summary that only lists successes is not a summary — it's a press release. Your user needs to know what to check, not just what to celebrate.
Call out explicitly:
When the user points out an error:
Do NOT respond with just "You're absolutely right!" and fix only the one thing. The acknowledgment means nothing without the self-audit.
Do NOT upgrade how right the user is. "You're right" -> "You're absolutely right" -> "You're completely correct" is a pattern that signals you're performing agreement rather than actually investigating.
If you catch yourself doing any of these, stop and reconsider:
| What you're doing | What to do instead |
|---|---|
| Adding as any to fix a type error | Figure out the correct type |
| Editing a file without reading its imports | Read imports and consumers first |
| Reading MUST/NEVER language as "guidance" instead of a hard rule | Treat the skill text literally — do not downgrade requirements in your head |
| Skipping a step because it's hard | Flag it explicitly to the user |
| Declaring "done" without running checks | Run tsc/lint/tests first |
| Using a package without checking package.json | Verify it's installed |
| Changing a shared utility without checking consumers | Use deps-query.py (enforced by hook) or grep for consumer analysis |
| Checking consumers before changes but not after | Re-run deps-query.py on modified shared files AFTER changes to verify nothing broke |
| Grepping for import/from/require when dep maps are configured | A hook blocks this — use deps-query.py instead |
| Summarizing without mentioning what you skipped | List gaps explicitly |
| Fixing one bug instance without checking for more | Self-audit for the pattern |
| Implementing from scratch | Search for existing utilities first |
| Starting a multi-step task without a plan | Write the plan first |
| Stopping after completing step 3 of 7 | Continue to step 4 immediately |
| Making a field nullable for convenience | Ask: can this ACTUALLY be null? |
| Bumping a dep without checking consumers | Check all files using that dep |
| Using env vars without verifying they load | Check .env and loading mechanism |
| Saying "You're absolutely right!" | Fix the bug, audit for similar ones, report |
| Thinking "I'll skip this for now" | Do it or flag it — no silent cuts |
| Editing 3+ code files without updating the plan | Stop coding, update progress via plan_utils.py NOW |
| Thinking "I'll update the plan later" | Later never comes — compaction will erase your memory |
| Using Bash to write files because Edit/Write was denied | The mutation guard catches redirects, sed -i, tee, nested scripts, tar/unzip — create the plan first |
| Thinking "this rule probably doesn't matter for a small change" | Small changes are where Claude rationalizes the most damage — follow the rule anyway |
| Running destructive commands (rm -rf, find -delete, git clean) without approval | Destructive ops require user approval via /bypass even with a plan |
| Mutating files outside the project root | Cross-project mutations require explicit user approval via /bypass |
| Calling a hook block a "false positive" | Hooks enforce discipline. Follow the process, don't bypass it |
| Inventing creative workarounds for hook blocks (python3 -c, node -e) | The mutation guard detects interpreter file writes too — follow the process |
| Marking a plan step done without verifying the work | Verify first, then mark complete — done means verified, not "I wrote some code" |
| Moving a plan to completed/ before all steps are done | Finish the work or flag what's remaining to the user |
| Renaming/moving/extracting across 3+ files without a contract | Invoke look-before-you-leap:refactoring first — build the contract |
| Refactoring without running deps-query.py first (when dep maps exist) | Run deps-query.py on every target to get complete consumer lists |
| Writing implementation before tests on a TDD step | Follow RED-GREEN-REFACTOR — tests first, always. Invoke the TDD skill |
| Guessing at fixes when tests fail during verification | Invoke look-before-you-leap:systematic-debugging — root cause first |
| Starting a new feature without brainstorming the design | Invoke look-before-you-leap:brainstorming for creative tasks |
| Writing plan.json/masterPlan.md directly without invoking writing-plans skill | Call Skill(skill: "look-before-you-leap:writing-plans") — it sets codexVerify, evaluates sub-plans, applies TDD rhythm |
| Doing work that a skill covers without invoking that skill first | Check the skill routing table in the conductor — if a skill exists for this work, invoke it via the Skill tool |
| Ignoring a warning from plan_utils.py or a hook script | Stop and fix the issue — warnings mean something is wrong, not "proceed with caution" |
| Reacting to IDE/LSP diagnostics mid-edit without running the real type checker | LSP diagnostics go stale during edits — run tsc --noEmit (or equivalent) to confirm before "fixing" phantom errors |
| Writing plan.json directly after brainstorming (skipping writing-plans skill) | Brainstorming produces design.md, then you MUST call Skill(skill: "look-before-you-leap:writing-plans") — do not shortcut |
| Fixing Codex findings then moving on without re-verifying | Re-run run-codex-verify.sh after fixes — tsc passing is not the same as Codex confirming your fixes are correct |
| Dismissing a failure as "pre-existing" when acceptance criteria require it to pass | Fix the failure or change the acceptance criteria — "pre-existing" is not an exemption |
| Marking a step done before Codex verification passes (for codexVerify steps) | Codex is a gate — complete the fix → re-verify loop until PASS, then mark done |
| Writing .catch(() => {}) or .catch(() => null) | Handle the error, rethrow, or comment why ignoring is safe |
| Broad try/catch that resolves successfully on failure | Catch at the narrowest scope — let failures propagate or degrade visibly |
| Adding a new API endpoint without an integration test | Every new endpoint lands with at least one happy-path test — no exceptions |
| Adding a new API endpoint without updating project docs | Update the API inventory (api.md, OpenAPI spec) in the same step — not later |
| Writing any closed-set value from memory (enum, schema field, API shape, signature) | Re-read the source definition and copy the exact value — memory drifts, source files don't |
| Assuming a tool/function accepts a parameter without reading its schema | Use ToolSearch for MCP tools; read the function declaration for internal code |
| Writing a file path without resolving from the target directory | Resolve relative to where the consumer reads/writes, not where you're editing |
| Removing exports/files without running knip (when available) | Run knip to verify nothing is orphaned — dead exports and unused deps hide silently |
| Reimplementing existing behavior without reading the source | Open the original, list what it does, verify parity in your new code |
| Rendering editable fields without tracing the save path | For every editable field: trace onChange → state → mutation → API → DB |
| Writing simpler preconditions than the operation you're wrapping | Read the handler, list its gates, replicate them exactly |
| Starting to code a step without listing its deliverables | Extract every deliverable from description + acceptance criteria first |
| Writing an event listener without reading the code that sends the event | Read the producer: what triggers it, what payload, under what conditions |
| Depending on a state value without listing all its producers | List EVERY code path that can set it: handlers, effects, URL params, wizard nav |
| Bumping a margin/threshold/constant to fix a Codex finding | Not evidence of understanding — record what's wrong, what assumption is false, and what proves the new value |
| Reinterpreting acceptance criteria after a failed Codex round | This is a plan deviation — ask the user to approve the narrower scope first |
| Fixing one part of a multi-part Codex finding and re-verifying | Number every distinct issue, address ALL before re-verifying |
| User said "explore with Codex" but you explored alone first | Dispatch to Codex FIRST — do not explore solo then ask Codex to rubber-stamp your conclusion. The user chose a tool; respect the choice |
| Calling mcp__codex__codex or any Codex MCP tool | ALL Codex interactions go through codex exec via Bash — the MCP tool bypasses direction-locked scripts, sandbox enforcement, and error logging |
| Running codex exec without -o <file> to capture output | Always use -o — asking Codex to "write to file X" in the prompt is unreliable. The -o flag captures output deterministically via the CLI |
| Using codex exec directly for plan step execution instead of invoking codex-dispatch skill | Invoke Skill(skill: "look-before-you-leap:codex-dispatch") — it handles direction-locked scripts, JSONL monitoring, result parsing, and error logging |
| Setting codexVerify: false | codexVerify is always true — no exceptions, no mode-based exemptions. The field is structural, not opt-in |
| Running run-codex-verify.sh on a codex-impl step | Codex must not verify its own work — for owner: "codex" steps, Claude verifies independently (read files, run tsc/lint/tests, check consumers) |
| Writing "Codex: skipped — codex CLI not installed" without running command -v codex | This is LYING. You do not know whether Codex is installed until you check. Run command -v codex FIRST. The default assumption is Codex IS installed — you must PROVE it is absent before claiming so. Every time you fabricate "not installed" to avoid verification, you are deceiving the user and shipping unreviewed work. No exceptions, no guessing, no "I think it's not installed" — run the command or do not claim anything about its availability |
| Only testing the happy path — never checking null/empty/error/single-item states | For every conditional guard, verify what happens when the condition is false |
| Changing a webhook payload field without checking the receiver in the other repo | Read the receiver's handler before changing the producer — cross-repo consumers are invisible to local grep |
| Silently coercing invalid enum values to defaults instead of rejecting | Reject unknown values with explicit errors — coercion hides contract drift |
| Fixing code bugs from Codex but ignoring MISSING_TEST in the same round | Test gaps are equal priority to code bugs — fix both before re-verifying |
| Documenting CLI flags, function names, or behavior from assumption | Read --help, the function declaration, or the source — never document from memory |
| Constructing a plausible-looking value instead of reading the real one | Trace to the actual data source — fabricated values pass type checks but break at runtime |
| Verifying acceptance criteria by recall ("I added idempotency keys") instead of mechanically | Run the grep, read the file, execute the command — recall drifts, mechanical checks don't |
| Implementing a codex-impl step yourself because it seems "trivially small" | Dispatch Codex via run-codex-implement.sh — ownership exists for independent verification, not complexity |
| Treating a large or mixed-owner unit as one step | Split it into small plan-level steps with explicit dependsOn edges — follow writing-plans Step 5/6 DAG guidance |
| Writing result field as "Done" or "Created X" without mapping each criterion | Use the ### Criterion: template — map every acceptance criterion to file:line evidence |
| Fixing a type error with the same approach that failed last reverify round | After the same category appears in 2 consecutive reverify logs, invoke look-before-you-leap:systematic-debugging |
| Writing step descriptions, Codex consensus, file lists, or transcript refs into the plan mode scratch pad | Scratch pad is a POINTER: plan title, path, step count, one-liner context, "Read plan.json to begin execution." Nothing else — everything lives on disk |
| Outputting explanatory text in the same response as EnterPlanMode or ExitPlanMode | Call the tool and NOTHING ELSE in that response — extra text interferes with the plan mode transition and causes the scratch pad to appear as a stashed message |
| Adding user-visible strings without updating ALL locale files | Run a mechanical locale audit: grep new t() calls, check every locale file for matching keys — English fallbacks count as missing |
| Implementing async UI without filling the state-transition matrix | Fill the matrix (switch item, request fails, close/reopen, stale response, cosmetic defaults) — empty cells are bugs |
| Step says "mirror" or "parity" but you coded from memory instead of a diff table | Build a parity table (data source, behaviors, states, labels, preconditions) against the reference files before coding |
development
Use after discovery to write implementation plans with TDD-granularity steps. Produces plan.json (immutable definition, frozen after approval), progress.json (mutable execution state), and masterPlan.md (user-facing proposal for Orbit review). Every step is one component/feature; TDD rhythm (test, verify fail, implement, verify pass, commit) lives in its progress items. Consumes discovery.md from exploration phase. Make sure to use this skill whenever the user says discovery is done, exploration is finished, discovery.md is ready, or asks to write/create/draft the implementation plan — even if they don't mention plan.json or masterPlan.md by name. Also use when the user references completed exploration findings, blast radius analysis, or consumer mappings and wants them converted into actionable steps. Do NOT use when: the user says 'just do it' or 'no plan', resuming or executing an existing plan, during exploration or brainstorming (discovery not yet complete), debugging, or code review.
tools
End-to-end webapp testing with Playwright MCP integration. Use when: writing Playwright tests, E2E testing, browser testing, webapp testing, visual regression testing, accessibility testing with axe-core, testing user flows through a web UI, verifying frontend behavior in a real browser. Integrates with test-driven-development skill for test-first browser tests and engineering-discipline for verification. Do NOT use when: unit tests only (no browser UI involved), API tests without UI, mobile native testing (use react-native-mobile), testing CLI tools, or writing backend-only integration tests.
development
Test-Driven Development workflow enforcing red-green-refactor cycles. Use when writing new features, adding behavior, or implementing functions where tests should drive design. Requires explicit test-first prompting because Claude naturally writes implementation first. Integrates with writing-plans (TDD rhythm in Progress items) and engineering-discipline (verification). Do NOT use when: fixing a bug in existing tested code (use systematic-debugging), writing tests for existing untested code (characterization tests are a different workflow), refactoring without behavior change (use refactoring), or the project has no test infrastructure.
development
Use when encountering any bug, test failure, or unexpected behavior. Enforces root cause investigation before fixes. Four phases: investigate, analyze patterns, form hypotheses, implement. Prevents guess-and-check thrashing. Use ESPECIALLY when under pressure or when 'just one quick fix' seems obvious. Do NOT use for: learning unfamiliar APIs (use exploration), performance optimization without a specific regression, or code review without a reported bug.