skills/dev-implement/SKILL.md
This skill should be used when the user asks to 'implement the plan', 'start building', or 'execute the tasks'.
npx skillsauth add edwinhu/workflows dev-implementInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Announce: "I'm using dev-implement (Phase 5) to orchestrate implementation."
Iteration topology: serial /goal loop (agent-team parallel for 4+ independent tasks)
Load shared enforcement:
Auto-load all constraints matching applies-to: dev-implement:
!uv run python3 ${CLAUDE_SKILL_DIR}/../../scripts/load-constraints.py dev-implement
You MUST have these constraints loaded before proceeding. No claiming you "remember" them.
Dynamic plan re-read: Before starting work, re-read .planning/PLAN.md to catch any phases or tasks that were dynamically inserted by earlier phases. Do not rely on cached plan state from a prior phase.
Main Chat (you) dev-implement workflow (per level)
──────────────────────────────────────────────────────────────────────────
/goal <condition> ← user sets once at phase entry
dev-implement (this skill)
└─ per level: Workflow(name="dev-implement") ─→ parse PLAN table → DAG
sequential TDD implementer
per task → Verify Cmd → JS gate
← run FULL suite, mark [x], re-invoke next level
Main chat orchestrates the level loop + the full-suite ground-truth + the /goal. The workflow's implementers write the code (TDD) and the JS gate keys on real Verify Command exit codes — completion is not honor-system, and the dev-delegation-guard still forbids you from writing project code yourself.
Do NOT start implementation without these:
.planning/SPEC.md exists with final requirements.planning/PLAN.md exists with chosen approach.planning/PLAN.md Testing Strategy section is COMPLETE (all boxes checked).planning/PLAN_REVIEWED.md exists with status: APPROVEDIf any prerequisite is missing, STOP and complete the earlier phases.
Before anything else, verify the plan was reviewed:
# Check for plan review approval marker
head -5 .planning/PLAN_REVIEWED.md 2>/dev/null
If .planning/PLAN_REVIEWED.md does not exist → STOP. Return to dev-design Phase Complete.
If status: is not APPROVED → STOP. Plan review is incomplete.
This file is written by dev-plan-reviewer when it approves the plan. Its absence means the plan reviewer was SKIPPED — which means spec requirements may have been silently dropped from the plan.
| Thought | Reality | |---------|---------| | "I can see the plan looks complete" | Self-assessment is not review. The reviewer catches what you miss. | | "Plan reviewer would have approved anyway" | Then it takes 30 seconds. Run it. | | "User approved the plan directly" | User approves the approach. Reviewer checks spec coverage. Different gates. | | "I'll review it myself as I implement" | You won't. You'll be focused on code. That's why the gate exists. |
Check .planning/PLAN.md for: files to modify, implementation order, testing strategy.
Before starting ANY task, verify .planning/PLAN.md Testing Strategy:
[ ] Framework specified (not empty, not "TBD")
[ ] Test Command specified (runnable command)
[ ] First Failing Test described (specific test name)
[ ] Test File Location specified (actual path)
If ANY box is unchecked → STOP. Go back to design phase.
This is your LAST CHANCE to catch missing test strategy before writing code. </EXTREMELY-IMPORTANT>
You do NOT choose sequential-vs-parallel and you do NOT hand-dispatch tasks. Implementation is the dev-implement ultracode workflow, which reads the hardened PLAN.md table, builds the Deps DAG, and auto-parallelizes within each dependency level (one worktree-isolated implementer per task, all TDD). You drive the level loop: invoke the workflow per level, integrate the level's returned file contents, run the full suite, mark the rows [x], advance — all under one /goal. See The Process.
YOU CANNOT WRITE IMPLEMENTATION CODE WITHOUT A FAILING TEST FIRST.
This is not a suggestion. This is the workflow. Every task follows:
1. READ the test description from PLAN.md
2. WRITE the test file
3. RUN the test → SEE RED (failure)
4. ONLY THEN write implementation
5. RUN the test → SEE GREEN (pass)
If you catch yourself thinking these, STOP IMMEDIATELY:
| Thought | Reality | Action | |---------|---------|--------| | "No test infra, I'll just implement" | You should have caught this in explore/clarify | STOP. Go back. Add Task 0. | | "SPEC.md says manual testing" | SPEC.md is wrong | STOP. Fix SPEC.md. Ask user. | | "This task is too simple for tests" | Simple tasks benefit MOST from tests | Write the test anyway. | | "I'll add tests after this works" | That's not TDD. That's anti-helpful — untested code ships bugs. | DELETE your code. Write test first. | | "User is waiting, I'll be quick" | User wants WORKING code, not fast code | Take time. Write test first. | | "The subagent skipped tests" | Your job is to catch that | REJECT the work. Redo with tests. | | "Just this one exception" | No exceptions. Ever. | Write the test. |
If you wrote code without a failing test first, DELETE IT and start over. </EXTREMELY-IMPORTANT>
<EXTREMELY-IMPORTANT> ## The Iron Law of DelegationMAIN CHAT MUST NOT WRITE CODE. This is not negotiable.
Main chat orchestrates. Subagents implement. If you catch yourself about to use Write or Edit on a code file, STOP.
| Allowed in Main Chat | NOT Allowed in Main Chat |
|---------------------|--------------------------|
| Spawn Task agents | Write/Edit code files |
| Review Task agent output | Direct implementation |
| Write to .planning/*.md files | "Quick fixes" |
| Run git commands | Any code editing |
| Set/clear /goal for the phase | Bypassing delegation |
If you're about to edit code directly, STOP and spawn a Task agent instead.
These thoughts mean STOP—you're rationalizing:
| Thought | Reality | |---------|---------| | "It's just a small fix" | Small fixes become big mistakes. Delegate. | | "I'll be quick" | Quick means sloppy. Delegate. | | "The subagent will take too long" | Subagent time is cheap. Your context is expensive. | | "I already know what to do" | Knowing ≠ doing it well. Delegate. | | "Let me just do this one thing" | One thing leads to another. Delegate. | | "This is too simple for a subagent" | Simple is exactly when delegation works best. | | "I'm already here in the code" | Being there ≠ writing there. Delegate. | | "The user is waiting" | User wants DONE, not fast. They won't debug your shortcuts. | | "This is just porting/adapting code" | Porting = writing = code. Delegate. | | "I already have context loaded" | Fresh context per task is the point. Delegate. | | "It's config, not real code" | JSON/YAML/TOML = code. Delegate. | | "I need to set things up first" | Setup IS implementation. Delegate. | | "This is boilerplate" | Boilerplate = code = delegate. | | "PLAN.md is detailed, just executing" | Execution IS implementation. Delegate. |
If you're treating these rules as "guidelines for complex work" rather than "invariants for ALL work", you've already failed.
Simple work is EXACTLY when discipline matters most—because that's when you're most tempted to skip it. </EXTREMELY-IMPORTANT>
Before starting each task, check context availability:
Thresholds: | Level | Remaining Context | Action | |-------|------------------|--------| | Normal | >35% | Proceed with task | | Warning | 25-35% | Complete current task, then invoke dev-handoff | | Critical | ≤25% | Invoke dev-handoff immediately — no new tasks |
At Warning level: After the current task completes (don't abandon mid-task), invoke:
Read ${CLAUDE_SKILL_DIR}/../../skills/dev-handoff/SKILL.md and follow its instructions.
At Critical level: Stop immediately. Invoke dev-handoff before context is exhausted. A degraded handoff is better than no handoff.
Why: A 10-task implementation phase with 20% context remaining produces garbage for the last 5 tasks. Better to handoff cleanly and resume fresh than to push through with degraded output.
Use the Monitor tool for builds, test suites, or scripts that take >30 seconds. Monitor streams stdout events without blocking — you keep working and get notified on completion.
# Watch a test suite run
Monitor(
description="test suite progress",
timeout_ms=300000, persistent=false,
command="npm test 2>&1 | grep --line-buffered -E '(PASS|FAIL|✓|✗|error|complete)'"
)
# Watch a build
Monitor(
description="build progress",
timeout_ms=300000, persistent=false,
command="npm run build 2>&1 | grep --line-buffered -E '(error|warning|built|done|fail)'"
)
When NOT to use Monitor: For quick commands (<30s), use Bash directly. For one-shot "run and wait," use Bash(run_in_background=true). Monitor is for streaming progress from longer operations.
The workflow implements ONE dependency level per invocation; you loop across levels under the active /goal.
0. Set the goal (once, at phase entry):
/goal All tasks in .planning/PLAN.md are marked [x] AND [full-suite test command] exits 0
AND .planning/VALIDATION.md status is `validated`. Stop after [N] turns.
LOOP (one turn per level, under the active /goal):
1. Invoke the workflow:
Workflow(name="dev-implement", args={
"projectDir": "<absolute path to the dev project (cwd)>",
"pluginRoot": "<absolute path to this plugin's workflows/ dir — resolve ${CLAUDE_SKILL_DIR}/../../workflows>"
})
It picks the lowest level with pending tasks, runs its tasks SEQUENTIALLY (one
TDD implementer each, writing directly into the project tree), verifies each
(Verify Command exit 0 + read-only corroboration), and returns { overallPass,
level, tasksRemaining, tasks, findings, tasksThatFailed, reviews }. The code is
ALREADY in the tree when it returns — there is no merge step for you to do.
2. GROUND-TRUTH (self-reports are not truth): run the FULL suite + lint on the
tree (the PLAN.md Testing Strategy command — e.g. `pixi run pytest`,
`npm test && npm run lint`). The per-task Verify Commands ran in isolation;
this confirms the level integrates without regressions.
3. If result.overallPass AND the full suite is green:
mark this level's PLAN.md rows [x], log to LEARNINGS.md, END THE TURN —
the /goal evaluator re-fires for the next level (or closes if tasksRemaining=0).
No pause, no "should I continue?".
4. If result.overallPass is false (a task failed TDD/verify) OR the full suite
regressed: read result.findings, fix the cause, then re-invoke with
onlyChecks=result.tasksThatFailed + priorReviews=result.reviews. An R4 block
(architectural — new schema, lib swap, breaking API) is a critical finding —
STOP and escalate it to the user; do not invent the architectural fix.
A blocked task (R4 — new schema, lib swap, breaking API) is in findings as critical and overallPass=false. STOP and present the R4 to the user; do not invent an architectural fix.
The JS gate (result.overallPass) + your full-suite run are authoritative. Do not hand-wave a level done; the Verify Command exit codes decide, not your read of the output.
Cache lookup pattern for skill paths: Read ${CLAUDE_SKILL_DIR}/../../TARGET/PATH and follow its instructions.
If a PLAN.md task involves rendered visual output, use visual-verify for the render → look-at → fix steps inside the task. Visual-verify is part of what happens inside a turn — /goal still drives the outer loop.
Signals a task is visual: task mentions "render", "slide", "chart", "figure", "layout", "UI", "screenshot", "visual", "diagram", or produces any file meant to be seen by humans (PNG, PDF, SVG).
Read ${CLAUDE_SKILL_DIR}/../../skills/visual-verify/SKILL.md and follow its instructions.
Before working through PLAN.md, set a /goal whose condition encodes the full phase exit criteria. The user runs this once; subsequent turns fire automatically until the evaluator says the condition holds.
Condition template (copy, fill in brackets, hand to the user):
/goal All tasks in .planning/PLAN.md are marked [x] complete, [TEST COMMAND] exits 0
on the full suite, .planning/VALIDATION.md exists with status `validated`,
and no Task agent reports unresolved blockers. Stop after [N] turns or if the
same test fails 3 turns in a row (trigger Failure Recovery Protocol).
Key constraints baked into the condition:
pixi run pytest, npm test && npm run lint) so the evaluator can read the exit code from the transcript.If the user prefers to drive /goal themselves, hand them the literal condition string instead of setting it for them.
Implementation is the dev-implement workflow, looped per dependency level — see The Process. You do NOT hand-dispatch tasks; the workflow's implementers follow dev-tdd and run each task's Verify Command, and the JS gate keys on the real exit codes. (The legacy per-task dev-delegate template is now embedded in the workflow's implementer prompt; dev-delegate remains only for ad-hoc single-task dispatch outside this phase.)
After Task agent returns, you must personally verify (not trust the agent's report):
Orchestrator-role boundary (deliberate, scoped exception to C1b). dev-debug's C1b classifies reading source after a subagent as investigation. dev-implement is the orchestrator of a known PLAN.md task, so a narrow read is verification — but only within these lines:
| Orchestrator CAN (spec-compliance verification) | Orchestrator CANNOT (investigation — delegate it) |
|--------------------------------------------------|----------------------------------------------------|
| Read the file(s) the agent claims it wrote, to compare against the SPEC.md requirement for THIS task | Form hypotheses about why something is broken |
| Read the test file(s) and run the test command | grep/rg source to hunt for unrelated patterns |
| Check exit codes, diff *.test.* | Debug the logic yourself or fix it in main chat |
| | Read files beyond the task's claimed deliverables |
If verification surfaces a defect, you do NOT debug it in main chat — you REJECT the task and re-dispatch the implementer (or dev-debug). Reading widens past the claimed deliverables = you've crossed into investigation; stop and delegate.
Read the implementation file(s) the agent claims to have written.
Compare to SPEC.md requirements line by line.
Read the test file(s). Look for .skip(), mock-only tests, or tests that don't call real code.
Actually run the test command. Read the output.
If the feature integrates with an external system (Electron app, API, database),
you MUST verify it works against the real system, not just mocks.
If ANY check fails → REJECT the work. Do NOT mark task complete.
| Thought | Reality | Action | |---------|---------|--------| | "The agent said tests pass" | Agents lie. Verify yourself. | Run the tests. | | "66 tests passing is enough" | Count skipped tests. Read test code. | Check for fake tests. | | "I'll verify at the end" | You'll forget. Bugs compound. | Verify NOW. | | "The spec said X, code does Y, but Y is close enough" | Close enough = wrong. | Reject and redo. | | "Integration test is skipped but unit tests pass" | Unit tests don't prove integration works. | Require real integration test. | | "External system isn't running, but code is correct" | Untested code is broken code. | Start the system and test. |
If ALL pass → mark the task [x] in PLAN.md and move on. If ANY fail → iterate within the active /goal; the next turn will fire automatically.
After a task passes review, append a structured summary to LEARNINGS.md:
## Task N: [task description]
---
task: N
status: completed
implements: [REQ-01, REQ-03]
affects: [src/auth/, tests/test_auth.py]
key-files:
created: [list of new files]
modified: [list of changed files]
deviations: {r1: 0, r2: 1, r3: 0, r4: 0}
---
One-liner: [SUBSTANTIVE summary — not "Task complete" but "JWT refresh rotation with 7-day expiry using jose library"]
Changes: [what was added/modified and why]
Test: [test command and result]
One-liner rule: Must be SUBSTANTIVE. Good: "Added rate limiting middleware with sliding window at 100 req/min". Bad: "Implemented task 3" or "Done".
You WILL discover unplanned work during implementation. Apply these rules automatically and track all deviations.
| Rule | Trigger | Action | Permission |
|------|---------|--------|------------|
| 1: Bug | Broken behavior, errors, wrong queries, type errors, security vulns, race conditions, leaks | Fix → test → verify → track [Rule 1 - Bug] | Auto |
| 2: Missing Critical | Missing essentials: error handling, validation, auth, CSRF/CORS, rate limiting, indexes, logging | Add → test → verify → track [Rule 2 - Missing Critical] | Auto |
| 3: Blocking | Prevents completion: missing deps, wrong types, broken imports, missing env/config/files, circular deps | Fix blocker → verify proceeds → track [Rule 3 - Blocking] | Auto |
| 4: Architectural | Structural change: new DB table, schema change, new service, switching libs, breaking API, new infra | STOP → present decision → track [Rule 4 - Architectural] | Ask user |
Priority: Rule 4 (STOP) > Rules 1-3 (auto) > unsure → Rule 4 Edge cases: missing validation → R2 | null crash → R1 | new table → R4 | new column → R1/2
When you encounter an architectural deviation, STOP and present:
⚠️ Architectural Decision Needed
- Current task: [task name]
- Discovery: [what prompted this]
- Proposed change: [modification]
- Why needed: [rationale]
- Impact: [what this affects]
- Alternatives: [other approaches]
Proceed with proposed change? (yes / different approach / defer)
All deviations tracked per task:
[Rule N - Category] Title
End each task summary with: Total deviations: N auto-fixed (R1: X, R2: Y, R3: Z). Impact: [assessment].
| Your Drive | Why You Skip | What Actually Happens | The Drive You Failed | |------------|-------------|----------------------|---------------------| | Helpfulness | "Skipping TDD gets code to user faster" | Untested code creates bugs the user discovers later | Anti-helpful | | Competence | "I assumed it works, no need to run tests" | The user runs it and it fails — your assumption destroyed trust | Incompetent | | Efficiency | "Skipping spec check saves time" | Spec drift means rework — your speed was waste | Inefficient | | Approval | "I'll delegate without full context" | Subagent builds wrong thing, you redo everything — user loses trust | Trust destroyed | | Honesty | "Task complete" without running tests | You claimed tests pass without running them — that's fabrication | Dishonest |
The protocol is not overhead you pay. It is the service you provide.
| Skill | Purpose | Used By |
|-------|---------|---------|
| /goal (built-in) | Cross-turn iteration with separate-model evaluation | Set by user/main chat at phase entry |
| dev-delegate | Task agent templates | Main chat |
| dev-tdd | TDD protocol (RED-GREEN-REFACTOR) | Task agent |
| dev-test | Testing tools (pytest, Playwright, etc.) | Task agent |
When a task is blocked by something other than a failing test — a missing dependency, an unavailable service, an environment/config gap, an upstream task not yet done — do NOT silently spin. Classify and act:
| Option | When | Action |
|--------|------|--------|
| Retry | Transient (flaky network, race, first-run setup) | Re-run once. If it clears, continue. Log the retry in LEARNINGS.md. |
| Skip | The blocked task is independent of the remaining tasks | Mark the task [blocked] in PLAN.md with the reason, proceed to the next independent task, and surface the skipped task at phase end. Never skip a task others depend on. |
| Stop | The blocker prevents all forward progress, or is architectural (R4) | STOP, write .planning/RECOVERY.md with the blocker, and consult the user. |
Default when unsure → Stop and ask. A blocker is not a test failure — the 3-failure trigger below is for tasks that run but fail tests.
Pattern from oh-my-opencode: After 3 consecutive implementation failures, escalate.
If you attempt 3 implementations and ALL fail tests:
Iteration 1: Implement approach A → tests fail
Iteration 2: Implement approach B → tests fail
Iteration 3: Implement approach C → tests fail
→ TRIGGER RECOVERY PROTOCOL
STOP all further implementation attempts
REVERT to last known working state
git checkout <last-passing-commit>.planning/RECOVERY.mdDOCUMENT what was attempted
CONSULT with user BEFORE continuing
ASK USER for direction
NO PASSING TESTS = NOT COMPLETE (hard rule)
Before continuing after multiple failures:
DON'T:
DO:
Loop 1: Implement with synchronous approach → Tests timeout
Loop 2: Implement with async/await → Tests hang
Loop 3: Implement with promises → Tests fail assertion
→ RECOVERY PROTOCOL:
1. STOP (no loop 4)
2. REVERT: git checkout HEAD -- src/feature.ts tests/
3. DOCUMENT in .planning/RECOVERY.md:
- Pattern: All async implementations cause timing issues
- Tests expect synchronous behavior
- Hypothesis: Requirements may need async, tests don't handle it
4. ASK USER:
"I've tried 3 async implementations. All cause timing issues.
Tests expect synchronous behavior.
This suggests either:
A) Feature should actually be synchronous (simpler)
B) Tests need updating for async behavior
Which direction should I take?"
Trigger after 3 failures when:
Don't wait for max iterations - trigger early when pattern emerges.
The /goal condition's Stop after N turns clause causes the evaluator to return done with reason "turn budget exhausted." Still do NOT ask user to manually test.
Main chat should:
/goal with a different approachNever default to "please test manually". Always exhaust automation first.
[x] complete/goal keeps firing turns until the condition holds.| Thought | Reality | |---------|---------| | "Task done, let me check in with user" | NO. User wants ALL tasks done. Keep going. | | "User might want to review" | User will review at the END. Continue. | | "Natural pause point" | Only pause when ALL tasks complete or blocked. | | "Let me summarize progress" | Summarize AFTER all tasks. Keep moving. | | "User has been waiting" | User is waiting for COMPLETION, not updates. | | "Should I continue?" | YES. Never ask. Just continue. | | "I'll update PLAN.md later" | NO. Update it NOW before next task. |
[x] completeA [x] mark in PLAN.md + a passing test command in the transcript signals task completion. After verifying, update PLAN.md, then IMMEDIATELY start the next task — the /goal evaluator reads the transcript and decides when the whole phase is done.
Pausing between tasks is procrastination disguised as courtesy.
If the user sends a message that is NOT about the current implementation, you MUST announce the loop pause before responding — then resume. (Stopping point #3, made explicit.)
This mirrors dev-debug's protocol: silent loop abandonment is how a structured /goal loop gets dropped and never resumed.
Protocol:
/goal loop at task N to address your request.".planning/PLAN.md (and LEARNINGS.md) and dispatch the next task's implementer.If the message could be EITHER a new topic OR part of the current task: ask "Is this part of the current task, or a separate request?" — do NOT assume separate and silently abandon the loop.
Silently dropping the loop is NOT HELPFUL — the user set /goal because they want all tasks driven to completion. Abandoning it discards their explicit request.
After each task's verification completes:
[x]Violations to catch:
Pausing > 30 seconds between tasks means you've stopped. You shouldn't have. </EXTREMELY-IMPORTANT>
This gate validates that every requirement in SPEC.md has corresponding test coverage. TDD ensures task-level coverage; test gap ensures requirement-level coverage. They are different checks.
Read ${CLAUDE_SKILL_DIR}/../../skills/dev-test-gaps/SKILL.md and follow its instructions.
Must produce .planning/VALIDATION.md before proceeding to review.
| VALIDATION.md Status | Action |
|---------------------|--------|
| validated | Proceed to review phase |
| gaps_found (gaps filled, no escalations) | Re-run full test suite. If all pass, proceed. |
| gaps_found (with escalations) | Address escalated implementation bugs: dispatch targeted Task agents for failing requirements (the active /goal keeps firing turns), then re-run test gap validation |
| Missing | STOP. Run test gap validation. |
If test gap reports implementation bugs (escalations):
/goal keeps firing turns until VALIDATION.md is validated)validated| Thought | Reality | |---------|---------| | "All task tests pass, test gap is redundant" | Task tests != requirement coverage. Gaps hide between tasks. Run test gap. | | "test gap will slow us down" | Shipping untested requirements slows the USER down. Run test gap. | | "I'll validate coverage manually" | Manual validation is not validation. Run the skill. | | "Requirements are simple, tests obviously cover them" | "Obviously" is not evidence. Run test gap and prove it. | | "We already wrote thorough tests" | Then test gap will confirm that quickly. Run it. | </EXTREMELY-IMPORTANT>
Phase summary (append to LEARNINGS.md):
## Phase: Implement
---
phase: implement
status: completed
requires: [PLAN.md, PLAN_REVIEWED.md]
provides: [VALIDATION.md, implementation-complete, all-tests-passing]
tasks-completed: N/N
total-deviations: {r1: X, r2: Y, r3: Z, r4: W}
---
REQUIRED SUB-SKILL: After ALL tasks complete with passing tests AND test gap validation passes:
Read ${CLAUDE_SKILL_DIR}/../../skills/dev-review/SKILL.md and follow its instructions.
Do NOT proceed until automated tests pass for every task AND .planning/VALIDATION.md status is validated.
tools
Use when "query Dewey Data", "deweydata.io", "SafeGraph places/patterns/spend", "Advan foot traffic", "POI / points of interest", "mobility data", "dataplor", "Veraset", "PassBy", "crypto/Bitcoin ATM locations", or any pull from the Dewey Data academic marketplace (UVA/NYU Platform Subscription) via the deweypy/deweydatapy client, DuckDB, or the Dewey MCP server.
testing
Internal skill for literature review and source materialization. Called after brainstorm, before setup. NOT user-facing.
documentation
This skill should be used when the user asks to 'write a paper', 'start a writing project', 'draft an article', 'write about', 'brainstorm writing topics', 'gather sources for a paper', 'what should I write about', or needs the writing workflow entry point for any writing task.
testing
Validate draft sections cover all PRECIS claims before review.