Plan-Driven Development

Overview

Large features get a versioned plan file in plans/ that serves as the single source of truth — what's been done, what's in progress, and what's left. The plan persists across context windows so any agent (or agent team) can pick up where the last one left off.

PDD requires powerups:best-practices — invoke it, don't just reference it. Every practice in powerups:best-practices is mandatory here: TDD (tests before implementation), branching (never develop on main), investigation (subagent before building), clarifying questions (ask before assuming), DRY, frontend-design for UI, self-documenting APIs, update-docs when done. PDD adds planning infrastructure on top — it does NOT replace or relax any of those practices.

If you're unsure whether a best-practice applies: it does. PDD is powerups:best-practices + plans, not plans instead of powerups:best-practices.

When to Use

PDD can be invoked for features of any size. The difference is whether you write a plan file.

Write a plan file (plans/v{N}-*.md) when:

Feature spans multiple milestones or will take more than one session
Multiple files/modules need coordinated changes
You need to track progress across context resets
Multiple agents will work on different pieces in parallel

Use PDD in lightweight mode (plan inline, no file) when:

Feature is smaller but still touches multiple files
Work fits in one session with no risk of context loss
No milestones needed — it's a single logical chunk of work

In lightweight mode, still write a plan — but present it inline in the conversation for user review instead of writing it to a markdown file. The inline plan should cover: what you're changing, which files are affected, the impact scan results, and the implementation approach. Get user approval before coding. All other PDD rules still apply: invoke powerups:best-practices (including the impact scan), run the skill audit, create a branch, ask clarifying questions, TDD, and run all post-completion steps (simplify, changelog, update-docs, full test suite, lint).

On every session start for feature work:

Spawn an Explore subagent to check plans/ for an existing plan. Plan files are large and consume significant context — never read them directly in the main conversation. The subagent should read the plan and return a concise summary: current milestone, next unchecked task, any blockers, and key design decisions.
If the subagent finds an active plan, use its summary to orient yourself. Only read specific sections of the plan directly if you need exact task wording or file paths.
Update the progress summary table and check off completed tasks (these edits are small and fine in the main context).

Plan Location & Naming

plans/
├── v0-initial-build.md
├── v1-auth-and-production-readiness.md        ← initial plan
├── v1-auth-and-production-readiness-r2.md     ← revised (approach changed)
├── v2-multi-provider-support.md
└── ...

Naming: v{N}-{short-action-description}.md (initial), v{N}-{description}-r{R}.md (revisions)

v{N} — Sequential version number (check existing files for the next number)
{short-action-description} — Lowercase, hyphens, action-oriented (e.g., auth-and-production-readiness, multi-provider-support)
-r{R} — Revision suffix, appended when the plan undergoes a major change (r2, r3, etc.). The initial version has no suffix (implicitly r1).
If plans/ doesn't exist, create it

Title inside the file should match: # v1-r2: Auth & Production Readiness (WorkOS)

When to create a revision (new file with -r{R} suffix):

Core technical approach changes (e.g., DIY JWT → WorkOS)
Major scope added or removed (e.g., dropping a whole milestone)
Architecture fundamentally shifts

Do NOT create a revision for:

Checking off tasks (normal progress)
Adding/removing small tasks
Clarifying wording

Previous revision files stay in plans/ as historical record — don't delete the original when creating a revision.

Plan Structure

Every plan must have these sections:

1. Context

What problem this solves, why it's being built, key relationships and constraints. Written so an agent with zero prior context understands the full picture.

2. Design / Architecture

Data models, API endpoints, flow diagrams, key decisions and their rationale. Include enough detail that implementation doesn't require guessing.

3. Scenario Map (for complex changes only)

When to include this section: Complex refactors, upgrades to existing systems, anything that changes behavior users already rely on, or features where multiple actors (end user, admin, operator) interact with overlapping states. Skip it in lightweight mode and for simple additive features.

Why this exists: Plans written from a single happy-path perspective miss failure modes, edge cases, and state interactions — the things that actually cause production incidents. The scenario map forces you to enumerate every realistic user path before coding, so you spot gaps and ambiguities while they're cheap to resolve in the plan instead of discovering them mid-implementation or after ship.

How to build the map:

List the actors — who interacts with this system? (end user, admin, operator, background job, etc.)
List the state dimensions — flags, connection states, permissions, origin types, API outcomes (200/4xx/5xx), cached vs fresh, etc.
For each actor × state combination, write a row: what the user does, what the system does, and whether the current plan already covers it.
Explicitly flag gaps — rows where the plan is silent or ambiguous. Each gap must become one of: a new task in a milestone, a decision surfaced to the user via AskUserQuestion, or an explicit "out of scope" note.
Include adversarial and degraded states — disconnected mid-flow, stale tokens, network failure, duplicate submissions, two tabs racing, concurrent toggles, downgrade paths, rollback.

Table format (recommended):

### A — [Actor name, e.g., "Admin dashboard user"]

| # | State | What happens | Plan coverage |
|---|---|---|---|
| A1 | Feature off, no connection | Existing behavior unchanged | OK |
| A2 | Feature on, mid-flow disconnect | ??? | Gap — decide fallback |
| A3 | Feature on, API 5xx on write | Fall back vs hard fail | Gap — ask user |

Group rows by actor (A, B, C…) so it stays scannable. Keep each row to one line — detail lives in the corresponding milestone task.

Turn gaps into actions before plan approval:

For each Gap row, either (a) add a task that handles it, (b) mark it "out of scope" with a one-line rationale, or (c) use AskUserQuestion to collect a decision and then fold the answer back into the plan.
Do NOT leave gaps unresolved. An unresolved gap in the plan becomes a bug in production.

Example from a real plan: for a ticket-routing upgrade, the scenario map surfaced (1) what happens when the external API fails mid-chat, (2) whether disconnecting the connector should auto-reset the routing toggle, (3) duplicate ticket creation when a form is double-submitted, and (4) whether the toggle-off direction needs the same confirmation as toggle-on. None of these were obvious from the happy-path design, and each one changed a milestone.

4. Milestones with Task Checkboxes

The core of the plan. Each milestone is a logical chunk of work with:

### Milestone N: Short Name
**Goal:** One sentence.

Tests first (these will fail until implementation):
- [ ] Write failing tests for X
- [ ] Write failing tests for Y

Then implement to make tests pass:
- [ ] Implement X
- [ ] Implement Y

**Verification:**
- [ ] How to confirm this milestone is done

TDD is required by default. Every milestone that adds or changes behavior MUST list test tasks before implementation tasks. Tests are written first, confirmed to fail, then implementation makes them pass. This is non-negotiable unless the user explicitly opts out (e.g., "skip tests", "no tests for this"). If the user hasn't said to skip tests, include them.

Milestones that are pure refactors of already-tested code (e.g., moving code without changing behavior) don't need new tests — but existing tests must still pass.

Rules:

Tasks are concrete and actionable ("Create sync_hq/auth/models.py" not "Set up auth")
Include file paths where relevant
Test tasks come before implementation tasks within each milestone — this is the TDD ordering
Check off tasks (- [x]) as they are completed
Never remove completed tasks — they're the history

5. Progress Summary Table

Quick at-a-glance status at the bottom of the file:

| Milestone | Status | Notes |
|-----------|--------|-------|
| 1. Name   | Done   |       |
| 2. Name   | In progress | Blocked on X |
| 3. Name   | Not started | |

Update this table as milestones progress.

Multi-Agent Development

Plans are designed for parallel work. When a feature has independent milestones or tasks, spawn agent teams or subagents to work on different pieces concurrently.

How it works:

The plan file is the shared coordination point — all agents read from and write to it
Identify independent milestones/tasks that have no dependencies between them
Spawn subagents for independent work (e.g., one agent builds auth models while another builds rate limiting)
Each agent checks off its tasks in the plan as it completes them
Sequential tasks (where one depends on another) should NOT be parallelized

When to parallelize:

Multiple milestones with no shared dependencies
Independent modules within a milestone (e.g., OTP service + JWT service)
Tests and implementation across separate files

Agent task prompts should include:

Reference to the plan file: "Read plans/v{N}-{description}.md for full context"
Which specific milestone/tasks the agent owns
TDD requirement: write failing tests first, then implement

Workflow

Starting a new feature

Set /effort max — planning requires deep reasoning. Always use max effort during the planning phase.
Check plans/ — find the next version number
Create a feature branch FIRST — git checkout -b feat/{description}. NEVER write plans or implement code on main. This applies to the plan file itself — even the plan commit goes on a branch.
Invoke powerups:best-practices — this is not optional. Run the skill to ensure branching, codebase investigation (subagent), impact scan, and clarifying questions all happen before any code or plan is written. Do not just reference it — actually invoke it.
Skill audit — MANDATORY before writing the plan. List every available powerups skill by name, then for each one, explicitly state whether it applies to this feature and why. Output this analysis to the user before proceeding. This ensures no skill is forgotten during planning or implementation.

Example output:
```
Skill audit for v10-sync-change-details:
- best-practices: YES — always applies (already invoked)
- test-driven-development: YES — new backend logic needs tests
- simple-design-principles: YES — frontend UI with user-facing copy
- self-documenting-apis: YES — new API endpoint
- database-branching: NO — not using Ghost DB
- update-docs: YES — run after all milestones complete
- bug-fix: NO — this is a new feature, not a bug fix
```
Every skill marked YES must appear as an explicit task or note in the relevant milestone. If update-docs applies, it MUST appear as a task in the final milestone or as a post-completion step. Do not rely on remembering — write it into the plan.
Create plans/v{N}-{description}.md
Write Context and Design sections.
Scenario map (for complex refactors/upgrades): Enumerate every realistic user path, grouped by actor, across all meaningful state combinations — including failures, disconnects, retries, stale state, and concurrent flips. Flag gaps. For every gap, either add a milestone task, mark it out of scope with a rationale, or ask the user via AskUserQuestion and fold the answer in. Do not skip this for changes that modify existing behavior — single-perspective plans miss the edges that break production. Skip it only for purely additive, lightweight, or internal-only features.
Write the Milestones sections — include skill-specific tasks identified in step 5 and tasks that close every gap from step 8.
Get user approval on the plan before coding
Identify which milestones/tasks can be parallelized
Begin work — spawn subagents for independent pieces

After planning (before coding)

Run /update-docs to check if the plan itself revealed stale documentation (e.g., the investigation found outdated CLAUDE.md entries, incorrect API references in sibling repos, or drift in integration guides). Fix any staleness before starting implementation.

Resuming work (new context, no memory)

Spawn an Explore subagent to read plans/ and return a summary: which plan is active, what milestone you're on, what the next unchecked task is, and any key design context. Do not read plan files directly — they are large and will bloat your context.
Use the subagent's summary to orient yourself
Find the first unchecked task — that's where to resume
Continue working, checking off tasks as you go

During implementation

Invoke powerups:best-practices at the task level — TDD (write failing test, then implement), DRY (search before building), investigation (understand before changing). The plan organizes the work; powerups:best-practices governs how each task is executed. Actually invoke the skill, don't just follow it from memory.
Check off each task immediately when done: - [ ] → - [x]
Update the progress summary table when a milestone completes
If you discover new work, add tasks to the appropriate milestone
Commit the plan file alongside code changes
If the approach fundamentally changes (e.g., switching auth strategy), create a new revision file (-r2) rather than editing in place — keep the original as history

Plan drift — keep the plan in sync with reality

The plan is the single source of truth. The moment your implementation diverges from what's written, the plan stops being trustworthy and any agent resuming from it will be confused or repeat a failed approach. Every time a decision changes mid-flight, update the plan in the same commit as the code.

This rule is for small/medium drift — a decision that changed, an approach that didn't work, a tweak to data shape, a different library choice. For fundamental approach changes (auth strategy, architecture pivot, dropping a milestone), use the -r{R} revision file mechanism above instead.

Update additively — never rewrite history. The original task wording and decisions stay visible so future readers can see what was tried and why it changed.

How to record drift in-place:

Leave the original task / decision text as-is. Do not delete or rewrite it.
Directly under it, add a > **Revised ({YYYY-MM-DD}):** callout that states:
- What was originally planned
- What you actually did
- Why it changed (didn't work, edge case, better approach surfaced, user redirected)
If new tasks are needed, add them as fresh checkboxes under the revised note. If old tasks no longer apply, leave them visible and annotate ~~superseded — see revised note~~ (or check them with a (superseded) suffix) — never delete.
Update the Design / Architecture section if the change affects the documented design. Same pattern: add a "Revised" note rather than overwriting.
Commit the plan update with the code change — one commit, both files. A code change without the matching plan update is incomplete work.

Example:

- [x] Use Redis for OTP storage with 5-minute TTL
  > **Revised (2026-05-10):** Switched to Postgres with a `expires_at` column instead.
  > Redis would have required a new dependency for one feature; the existing Postgres
  > connection handles this with no infra change. TTL enforced via a cleanup job.
- [x] Add `expires_at` column and cleanup job for OTP rows

Why this matters: Drift is silent. A task gets checked off because it's "done" in the developer's head, but the plan still describes the original approach. Six weeks later, someone reads the plan and is misled. Updating additively costs 30 seconds and keeps the plan honest.

After each major milestone — pause for user testing

When a milestone is complete (all tasks checked), stop and let the user test manually before moving on:

Provide step-by-step test instructions:
- Prerequisites (env vars, server running, etc.)
- Exact commands to run (curl, browser URLs, SQL queries)
- Expected output for each step
- How to verify success vs. failure
Include setup steps if the milestone introduced new dependencies, env vars, or configuration
Wait for user confirmation that testing passed before starting the next milestone
Clear context after successful testing — the plan file has all the state needed to resume in a fresh context

This ensures the user validates each milestone incrementally rather than discovering issues after everything is built.

After all milestones complete — POST-COMPLETION AUDIT

Just like the skill audit gates planning, the post-completion audit gates the PR. You MUST output this audit to the user before creating the PR — list each step, its status (done/not done), and evidence (e.g., "24 tests pass", "CHANGELOG.md updated with entry for..."). Do NOT create the PR until every item shows as done.

Post-completion audit output format:

Post-completion audit:
1. Skill audit review:     DONE — all 5 YES skills executed (best-practices, TDD, simple-design, update-docs, change-log)
2. Drift audit:            DONE — additive: 7 unplanned widgets + 3 deps recorded in plan;
                                  subtractive: 4 orphan files deleted, 2 completed Post-MVP items removed
3. /simplify:              DONE — deleted 200 lines dead code, fixed 3 issues
4. change-log:             DONE — added entry "Your assistant can now..."
5. update-docs:            DONE — CLAUDE.md and connector guide updated
6. Linter:                 DONE — no new warnings
7. Full test suite:        DONE — 133 passed, 0 failed
8. PR ready:               YES — manual verification steps included

The 8 steps:

Skill audit review — go back to your skill audit output and confirm every YES skill was actually executed. If any was missed, execute it now.
Run powerups:drift-audit — actually invoke the skill. It reconciles the shipped artifact with the plan in BOTH directions:
- Additive drift — things that landed in the code but were never in the plan (new widgets a user asked for mid-build, dependencies you had to add, heuristics you layered on, bug fixes worth recording). Gets folded into a new ## What changed from the initial plan section in the plan file.
- Subtractive drift — things that should be gone but aren't (orphan component files from a replaced feature, redirect stubs for paths nobody bookmarks anymore, completed "Post-MVP" items still listed as deferred, stale TODO comments, orphaned feature flags, unused imports, deprecated DB objects). Walked with the user and deleted in a chore(plan-drift) commit.
This is NOT optional. The plan is the historical record of what shipped — if it disagrees with the code, the plan is wrong and must be fixed before the PR. The subtractive sweep prevents stragglers from a replaced feature from rotting in the repo. Drift caught here is cheap; drift caught six weeks from now by a confused agent (or a curious dev wondering "is GeoMapCard.tsx still used?") is expensive.

Run this BEFORE /simplify so the cleanup is informed by both directions of drift — /simplify shouldn't refactor code that's about to be deleted as subtractive drift.
Run /simplify — review all changed code for reuse, quality, and efficiency. Fix any issues found. This is NOT optional.
Run powerups:change-log — add an entry to CHANGELOG.md in plain, business-user-friendly language. This is NOT optional for user-facing changes.
Run update-docs — sync all documentation (CLAUDE.md, guides, sibling repos). This is NOT optional.
Run the project's linter — fix any lint errors introduced by your changes.
Run the FULL test suite — pytest (or the project's test command). ALL tests must pass. This catches regressions where new code breaks existing tests. A green test suite is a hard gate.
Create PR with manual verification steps (see below). Reference the drift section in the body so reviewers don't have to reverse-engineer scope creep.

Why this matters: Skipping post-completion steps is the #1 cause of broken PRs. The drift audit catches both plan rot (additive) and repo straggle (subtractive). /simplify catches code-quality issues. update-docs catches stale documentation. The full test suite catches regressions. Each step exists because skipping it has caused real problems.

PR manual verification steps — MANDATORY

Every PR must include a Manual verification section with step-by-step instructions the reviewer can follow to verify the feature works. Automated tests verify code correctness — manual steps verify feature correctness. These are different things.

Requirements:

Each scenario is numbered with a descriptive title (e.g., "1. Widget: Talk to Human button")
Steps are sequential and specific — exact actions to take, not vague descriptions
Each step that checks behavior has a Verify: line stating what the reviewer should see
Include prerequisites if needed (env vars, server running, test data)
Cover the golden path, at least one edge case, and a no-regressions check

Example structure:

## Manual verification

### 1. Creating a new widget
1. Start local dev (`npm run dev`)
2. Navigate to the dashboard → Widgets page
3. Click "Create Widget"
4. Fill in name: "Test Widget", select theme: "Dark"
5. Click Save
6. **Verify:** Widget appears in the list with name "Test Widget" and dark theme badge
7. **Verify:** Toast shows "Widget created"

### 2. Edge case: duplicate name
1. Try creating another widget with name "Test Widget"
2. **Verify:** Error message "A widget with this name already exists"
3. **Verify:** No duplicate created in the list

### 3. No regressions
- [ ] Existing widgets still display correctly
- [ ] Widget settings page still loads
- [ ] Delete widget still works

Do NOT write vague test plans like "verify it works" or "check the UI". Every step should be reproducible by someone who has never seen the feature.

Rolling back work

If code is reverted or the developer isn't happy with the implementation:

Uncheck tasks back to the rolled-back state: - [x] → - [ ]
Update the progress summary table to reflect the actual state
The plan must always match reality — if code was reverted, the checkboxes must revert too

Completing a plan

All checkboxes checked
Progress summary shows all milestones as "Done"
Plan stays in plans/ as historical record

UI Work

When any milestone involves user-facing text or UI changes, use the simple-design-principles skill. Flag these tasks in the plan — copy should be reviewed with the same rigor as code.

API Work

When any milestone involves creating or modifying API endpoints, use the self-documenting-apis skill. Every endpoint should have docstrings and typed response models so auto-generated docs (/docs, /redoc) are the single source of truth — no separate API reference file to maintain.

Common Mistakes

| Mistake | Fix | |---------|-----| | Starting to code without checking plans/ | Always check first — you may be mid-feature | | Vague tasks ("set up auth") | Be specific: file paths, endpoint names, model fields | | Forgetting to check off tasks | Update immediately — the plan is only useful if current | | Creating a plan for a 10-minute fix | Use PDD lightweight mode — no plan file, but still follow all PDD rules | | Tracking progress elsewhere (todos, comments) | The plan file is the single source of truth | | Running all tasks sequentially | Identify independent work and spawn subagents | | Skipping investigation/questions because "I know the codebase" | Always follow best-practices steps — investigate and ask first | | Listing implementation tasks before test tasks in milestones | TDD is the default — test tasks come first. Only skip if the user explicitly opts out | | Writing a plan with no tests at all | Every milestone that adds behavior needs test tasks. If you forgot them, add them before starting implementation | | Treating PDD as a replacement for best-practices | PDD = powerups:best-practices + plans. Actually invoke the skill — don't just follow it from memory | | Jumping straight to coding after writing the plan | Follow best-practices: create branch, investigate codebase, ask clarifying questions FIRST | | Writing the plan file or implementing on main | Always create a feature branch before writing anything — plans and code both go on branches, never main | | Skipping the skill audit before writing the plan | Always run the skill audit (step 4) — list every powerups skill, decide YES/NO for each, and write YES skills into the plan as tasks | | Forgetting to run update-docs or other skills after completion | Go back to the skill audit and check off each YES skill. If you didn't run it, run it now | | Skipping the full test suite before creating the PR | Always run all tests after the final milestone. Tests and code can drift independently (e.g., fixtures use old table names while code uses new ones). A full suite run is the only way to catch this. | | PR with no manual testing steps | Every PR needs a Manual verification section with numbered scenarios, specific actions, and Verify: lines. "Check the UI" is not a test step. | | Skipping post-completion steps | Output the post-completion audit to the user before creating the PR. ALL 8 steps must show as DONE with evidence. Running tests alone is not enough — you must also run /simplify, change-log, update-docs, the plan-vs-reality audit, and the linter. | | Skipping the scenario map on a complex refactor | For anything touching existing behavior, multiple actors, or overlapping states, enumerate every realistic user path grouped by actor. Single-perspective plans miss the edges that cause production incidents. Flag gaps and close every one before approval. | | Implementing differently than planned without updating the plan | The plan is the source of truth. The moment your approach changes mid-flight, update the plan additively (> **Revised:** note under the original task) — never rewrite history. Commit the plan update with the code change, not after. If you only catch the drift at the end, the plan-vs-reality audit (post-completion step 2) is your last chance to fix it before the PR. |

Plan-Driven Development

Overview

If you're unsure whether a best-practice applies: it does. PDD is powerups:best-practices + plans, not plans instead of powerups:best-practices.

When to Use

PDD can be invoked for features of any size. The difference is whether you write a plan file.

Write a plan file (plans/v{N}-*.md) when:

Feature spans multiple milestones or will take more than one session
Multiple files/modules need coordinated changes
You need to track progress across context resets
Multiple agents will work on different pieces in parallel

Use PDD in lightweight mode (plan inline, no file) when:

Feature is smaller but still touches multiple files
Work fits in one session with no risk of context loss
No milestones needed — it's a single logical chunk of work

On every session start for feature work:

Spawn an Explore subagent to check plans/ for an existing plan. Plan files are large and consume significant context — never read them directly in the main conversation. The subagent should read the plan and return a concise summary: current milestone, next unchecked task, any blockers, and key design decisions.
If the subagent finds an active plan, use its summary to orient yourself. Only read specific sections of the plan directly if you need exact task wording or file paths.
Update the progress summary table and check off completed tasks (these edits are small and fine in the main context).

Plan Location & Naming

plans/
├── v0-initial-build.md
├── v1-auth-and-production-readiness.md        ← initial plan
├── v1-auth-and-production-readiness-r2.md     ← revised (approach changed)
├── v2-multi-provider-support.md
└── ...

Naming: v{N}-{short-action-description}.md (initial), v{N}-{description}-r{R}.md (revisions)

v{N} — Sequential version number (check existing files for the next number)
{short-action-description} — Lowercase, hyphens, action-oriented (e.g., auth-and-production-readiness, multi-provider-support)
-r{R} — Revision suffix, appended when the plan undergoes a major change (r2, r3, etc.). The initial version has no suffix (implicitly r1).
If plans/ doesn't exist, create it

Title inside the file should match: # v1-r2: Auth & Production Readiness (WorkOS)

When to create a revision (new file with -r{R} suffix):

Core technical approach changes (e.g., DIY JWT → WorkOS)
Major scope added or removed (e.g., dropping a whole milestone)
Architecture fundamentally shifts

Do NOT create a revision for:

Checking off tasks (normal progress)
Adding/removing small tasks
Clarifying wording

Previous revision files stay in plans/ as historical record — don't delete the original when creating a revision.

Plan Structure

Every plan must have these sections:

1. Context

What problem this solves, why it's being built, key relationships and constraints. Written so an agent with zero prior context understands the full picture.

2. Design / Architecture

Data models, API endpoints, flow diagrams, key decisions and their rationale. Include enough detail that implementation doesn't require guessing.

3. Scenario Map (for complex changes only)

How to build the map:

List the actors — who interacts with this system? (end user, admin, operator, background job, etc.)
List the state dimensions — flags, connection states, permissions, origin types, API outcomes (200/4xx/5xx), cached vs fresh, etc.
For each actor × state combination, write a row: what the user does, what the system does, and whether the current plan already covers it.
Explicitly flag gaps — rows where the plan is silent or ambiguous. Each gap must become one of: a new task in a milestone, a decision surfaced to the user via AskUserQuestion, or an explicit "out of scope" note.
Include adversarial and degraded states — disconnected mid-flow, stale tokens, network failure, duplicate submissions, two tabs racing, concurrent toggles, downgrade paths, rollback.

Table format (recommended):

### A — [Actor name, e.g., "Admin dashboard user"]

| # | State | What happens | Plan coverage |
|---|---|---|---|
| A1 | Feature off, no connection | Existing behavior unchanged | OK |
| A2 | Feature on, mid-flow disconnect | ??? | Gap — decide fallback |
| A3 | Feature on, API 5xx on write | Fall back vs hard fail | Gap — ask user |

Group rows by actor (A, B, C…) so it stays scannable. Keep each row to one line — detail lives in the corresponding milestone task.

Turn gaps into actions before plan approval:

For each Gap row, either (a) add a task that handles it, (b) mark it "out of scope" with a one-line rationale, or (c) use AskUserQuestion to collect a decision and then fold the answer back into the plan.
Do NOT leave gaps unresolved. An unresolved gap in the plan becomes a bug in production.

4. Milestones with Task Checkboxes

The core of the plan. Each milestone is a logical chunk of work with:

### Milestone N: Short Name
**Goal:** One sentence.

Tests first (these will fail until implementation):
- [ ] Write failing tests for X
- [ ] Write failing tests for Y

Then implement to make tests pass:
- [ ] Implement X
- [ ] Implement Y

**Verification:**
- [ ] How to confirm this milestone is done

Milestones that are pure refactors of already-tested code (e.g., moving code without changing behavior) don't need new tests — but existing tests must still pass.

Rules:

Tasks are concrete and actionable ("Create sync_hq/auth/models.py" not "Set up auth")
Include file paths where relevant
Test tasks come before implementation tasks within each milestone — this is the TDD ordering
Check off tasks (- [x]) as they are completed
Never remove completed tasks — they're the history

5. Progress Summary Table

Quick at-a-glance status at the bottom of the file:

| Milestone | Status | Notes |
|-----------|--------|-------|
| 1. Name   | Done   |       |
| 2. Name   | In progress | Blocked on X |
| 3. Name   | Not started | |

Update this table as milestones progress.

Multi-Agent Development

Plans are designed for parallel work. When a feature has independent milestones or tasks, spawn agent teams or subagents to work on different pieces concurrently.

How it works:

The plan file is the shared coordination point — all agents read from and write to it
Identify independent milestones/tasks that have no dependencies between them
Spawn subagents for independent work (e.g., one agent builds auth models while another builds rate limiting)
Each agent checks off its tasks in the plan as it completes them
Sequential tasks (where one depends on another) should NOT be parallelized

When to parallelize:

Multiple milestones with no shared dependencies
Independent modules within a milestone (e.g., OTP service + JWT service)
Tests and implementation across separate files

Agent task prompts should include:

Reference to the plan file: "Read plans/v{N}-{description}.md for full context"
Which specific milestone/tasks the agent owns
TDD requirement: write failing tests first, then implement

Workflow

Starting a new feature

Set /effort max — planning requires deep reasoning. Always use max effort during the planning phase.
Check plans/ — find the next version number
Create a feature branch FIRST — git checkout -b feat/{description}. NEVER write plans or implement code on main. This applies to the plan file itself — even the plan commit goes on a branch.
Invoke powerups:best-practices — this is not optional. Run the skill to ensure branching, codebase investigation (subagent), impact scan, and clarifying questions all happen before any code or plan is written. Do not just reference it — actually invoke it.
Skill audit — MANDATORY before writing the plan. List every available powerups skill by name, then for each one, explicitly state whether it applies to this feature and why. Output this analysis to the user before proceeding. This ensures no skill is forgotten during planning or implementation.

Example output:
```
Skill audit for v10-sync-change-details:
- best-practices: YES — always applies (already invoked)
- test-driven-development: YES — new backend logic needs tests
- simple-design-principles: YES — frontend UI with user-facing copy
- self-documenting-apis: YES — new API endpoint
- database-branching: NO — not using Ghost DB
- update-docs: YES — run after all milestones complete
- bug-fix: NO — this is a new feature, not a bug fix
```
Every skill marked YES must appear as an explicit task or note in the relevant milestone. If update-docs applies, it MUST appear as a task in the final milestone or as a post-completion step. Do not rely on remembering — write it into the plan.
Create plans/v{N}-{description}.md
Write Context and Design sections.
Scenario map (for complex refactors/upgrades): Enumerate every realistic user path, grouped by actor, across all meaningful state combinations — including failures, disconnects, retries, stale state, and concurrent flips. Flag gaps. For every gap, either add a milestone task, mark it out of scope with a rationale, or ask the user via AskUserQuestion and fold the answer in. Do not skip this for changes that modify existing behavior — single-perspective plans miss the edges that break production. Skip it only for purely additive, lightweight, or internal-only features.
Write the Milestones sections — include skill-specific tasks identified in step 5 and tasks that close every gap from step 8.
Get user approval on the plan before coding
Identify which milestones/tasks can be parallelized
Begin work — spawn subagents for independent pieces

After planning (before coding)

Resuming work (new context, no memory)

Spawn an Explore subagent to read plans/ and return a summary: which plan is active, what milestone you're on, what the next unchecked task is, and any key design context. Do not read plan files directly — they are large and will bloat your context.
Use the subagent's summary to orient yourself
Find the first unchecked task — that's where to resume
Continue working, checking off tasks as you go

During implementation

Invoke powerups:best-practices at the task level — TDD (write failing test, then implement), DRY (search before building), investigation (understand before changing). The plan organizes the work; powerups:best-practices governs how each task is executed. Actually invoke the skill, don't just follow it from memory.
Check off each task immediately when done: - [ ] → - [x]
Update the progress summary table when a milestone completes
If you discover new work, add tasks to the appropriate milestone
Commit the plan file alongside code changes
If the approach fundamentally changes (e.g., switching auth strategy), create a new revision file (-r2) rather than editing in place — keep the original as history

Plan drift — keep the plan in sync with reality

Update additively — never rewrite history. The original task wording and decisions stay visible so future readers can see what was tried and why it changed.

How to record drift in-place:

Leave the original task / decision text as-is. Do not delete or rewrite it.
Directly under it, add a > **Revised ({YYYY-MM-DD}):** callout that states:
- What was originally planned
- What you actually did
- Why it changed (didn't work, edge case, better approach surfaced, user redirected)
If new tasks are needed, add them as fresh checkboxes under the revised note. If old tasks no longer apply, leave them visible and annotate ~~superseded — see revised note~~ (or check them with a (superseded) suffix) — never delete.
Update the Design / Architecture section if the change affects the documented design. Same pattern: add a "Revised" note rather than overwriting.
Commit the plan update with the code change — one commit, both files. A code change without the matching plan update is incomplete work.

Example:

- [x] Use Redis for OTP storage with 5-minute TTL
  > **Revised (2026-05-10):** Switched to Postgres with a `expires_at` column instead.
  > Redis would have required a new dependency for one feature; the existing Postgres
  > connection handles this with no infra change. TTL enforced via a cleanup job.
- [x] Add `expires_at` column and cleanup job for OTP rows

After each major milestone — pause for user testing

When a milestone is complete (all tasks checked), stop and let the user test manually before moving on:

Provide step-by-step test instructions:
- Prerequisites (env vars, server running, etc.)
- Exact commands to run (curl, browser URLs, SQL queries)
- Expected output for each step
- How to verify success vs. failure
Include setup steps if the milestone introduced new dependencies, env vars, or configuration
Wait for user confirmation that testing passed before starting the next milestone
Clear context after successful testing — the plan file has all the state needed to resume in a fresh context

This ensures the user validates each milestone incrementally rather than discovering issues after everything is built.

After all milestones complete — POST-COMPLETION AUDIT

Post-completion audit output format:

Post-completion audit:
1. Skill audit review:     DONE — all 5 YES skills executed (best-practices, TDD, simple-design, update-docs, change-log)
2. Drift audit:            DONE — additive: 7 unplanned widgets + 3 deps recorded in plan;
                                  subtractive: 4 orphan files deleted, 2 completed Post-MVP items removed
3. /simplify:              DONE — deleted 200 lines dead code, fixed 3 issues
4. change-log:             DONE — added entry "Your assistant can now..."
5. update-docs:            DONE — CLAUDE.md and connector guide updated
6. Linter:                 DONE — no new warnings
7. Full test suite:        DONE — 133 passed, 0 failed
8. PR ready:               YES — manual verification steps included

The 8 steps:

Skill audit review — go back to your skill audit output and confirm every YES skill was actually executed. If any was missed, execute it now.
Run powerups:drift-audit — actually invoke the skill. It reconciles the shipped artifact with the plan in BOTH directions:
- Additive drift — things that landed in the code but were never in the plan (new widgets a user asked for mid-build, dependencies you had to add, heuristics you layered on, bug fixes worth recording). Gets folded into a new ## What changed from the initial plan section in the plan file.
- Subtractive drift — things that should be gone but aren't (orphan component files from a replaced feature, redirect stubs for paths nobody bookmarks anymore, completed "Post-MVP" items still listed as deferred, stale TODO comments, orphaned feature flags, unused imports, deprecated DB objects). Walked with the user and deleted in a chore(plan-drift) commit.
This is NOT optional. The plan is the historical record of what shipped — if it disagrees with the code, the plan is wrong and must be fixed before the PR. The subtractive sweep prevents stragglers from a replaced feature from rotting in the repo. Drift caught here is cheap; drift caught six weeks from now by a confused agent (or a curious dev wondering "is GeoMapCard.tsx still used?") is expensive.

Run this BEFORE /simplify so the cleanup is informed by both directions of drift — /simplify shouldn't refactor code that's about to be deleted as subtractive drift.
Run /simplify — review all changed code for reuse, quality, and efficiency. Fix any issues found. This is NOT optional.
Run powerups:change-log — add an entry to CHANGELOG.md in plain, business-user-friendly language. This is NOT optional for user-facing changes.
Run update-docs — sync all documentation (CLAUDE.md, guides, sibling repos). This is NOT optional.
Run the project's linter — fix any lint errors introduced by your changes.
Run the FULL test suite — pytest (or the project's test command). ALL tests must pass. This catches regressions where new code breaks existing tests. A green test suite is a hard gate.
Create PR with manual verification steps (see below). Reference the drift section in the body so reviewers don't have to reverse-engineer scope creep.

PR manual verification steps — MANDATORY

Requirements:

Each scenario is numbered with a descriptive title (e.g., "1. Widget: Talk to Human button")
Steps are sequential and specific — exact actions to take, not vague descriptions
Each step that checks behavior has a Verify: line stating what the reviewer should see
Include prerequisites if needed (env vars, server running, test data)
Cover the golden path, at least one edge case, and a no-regressions check

Example structure:

## Manual verification

### 1. Creating a new widget
1. Start local dev (`npm run dev`)
2. Navigate to the dashboard → Widgets page
3. Click "Create Widget"
4. Fill in name: "Test Widget", select theme: "Dark"
5. Click Save
6. **Verify:** Widget appears in the list with name "Test Widget" and dark theme badge
7. **Verify:** Toast shows "Widget created"

### 2. Edge case: duplicate name
1. Try creating another widget with name "Test Widget"
2. **Verify:** Error message "A widget with this name already exists"
3. **Verify:** No duplicate created in the list

### 3. No regressions
- [ ] Existing widgets still display correctly
- [ ] Widget settings page still loads
- [ ] Delete widget still works

Do NOT write vague test plans like "verify it works" or "check the UI". Every step should be reproducible by someone who has never seen the feature.

Rolling back work

If code is reverted or the developer isn't happy with the implementation:

Uncheck tasks back to the rolled-back state: - [x] → - [ ]
Update the progress summary table to reflect the actual state
The plan must always match reality — if code was reverted, the checkboxes must revert too

Completing a plan

All checkboxes checked
Progress summary shows all milestones as "Done"
Plan stays in plans/ as historical record

UI Work

When any milestone involves user-facing text or UI changes, use the simple-design-principles skill. Flag these tasks in the plan — copy should be reviewed with the same rigor as code.

Adoption

jackyliang/plan-driven-development

$ install --global

Security Scan Results

SKILL.md

Plan-Driven Development

Overview

When to Use

Plan Location & Naming

Plan Structure

1. Context

2. Design / Architecture

3. Scenario Map (for complex changes only)

4. Milestones with Task Checkboxes

5. Progress Summary Table

Multi-Agent Development

Workflow

Starting a new feature

After planning (before coding)

Resuming work (new context, no memory)

During implementation

Plan drift — keep the plan in sync with reality

After each major milestone — pause for user testing

After all milestones complete — POST-COMPLETION AUDIT

PR manual verification steps — MANDATORY

Rolling back work

Completing a plan

UI Work

API Work

Common Mistakes

Related Skills

jackyliang/qq

jackyliang/drift-audit

jackyliang/plan-drift-audit

jackyliang/give-me-five

jackyliang/plan-driven-development

$ install --global

Security Scan Results

SKILL.md

Plan-Driven Development

Overview

When to Use

Plan Location & Naming

Plan Structure

1. Context

2. Design / Architecture

3. Scenario Map (for complex changes only)

4. Milestones with Task Checkboxes

5. Progress Summary Table

Multi-Agent Development

Workflow

Starting a new feature

After planning (before coding)

Resuming work (new context, no memory)

During implementation

Plan drift — keep the plan in sync with reality

After each major milestone — pause for user testing

After all milestones complete — POST-COMPLETION AUDIT

PR manual verification steps — MANDATORY

Rolling back work

Completing a plan

UI Work

API Work

Common Mistakes

Related Skills

jackyliang/qq

jackyliang/drift-audit

jackyliang/plan-drift-audit

jackyliang/give-me-five