marketplace/bundles/plan-marshall/skills/phase-2-refine/SKILL.md
Iterative request clarification until confidence threshold reached
npx skillsauth add cuioss/plan-marshall phase-2-refineInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Iterative workflow for analyzing and refining the request until requirements meet confidence threshold.
For detailed step-by-step procedures, see standards/refine-workflow-detail.md.
Skill: plan-marshall:dev-agent-behavior-rules
Shared lifecycle patterns: See phase-lifecycle.md for entry protocol, completion protocol, and error handling convention.
Execution mode: Follow workflow steps sequentially. Each step that invokes a script has an explicit bash code block.
Prohibited actions:
.plan/ files directly — all access must go through python3 .plan/execute-script.py manage-* scriptsmanage-status transitionmanage-config verbs during refine. The verbs set, init, sync-defaults, and sync-plan-defaults are forbidden in this phase — they modify project configuration that must remain stable across the confidence loop. Reading config via get is permitted..plan/local/plans/{plan_id}/** or .plan/local/worktrees/{plan_id}/**. Implementation edits — even when the request narrative reads like an implementation brief, even when an upstream lesson "obviously" needs a doc tweak, even when a test fixture would clarify intent — are the responsibility of phase-5-execute task bodies, NOT phase-2-refine. Refine produces refined-request artifacts only. The recurring anti-pattern (captured as feedback_phase2_refine_never_implements in the project memory log) is refine reaching for Edit / Write against marketplace/bundles/**, source files, or any other production path because the request narrative made the change "feel obvious". The Allowed write paths sub-section below is the only writable surface.Allowed write paths:
.plan/local/plans/{plan_id}/** — the plan's request, clarifications, references, status, decisions, and any other plan-scoped artifact..plan/local/worktrees/{plan_id}/** — the plan's isolated worktree, EXCLUDING the marketplace/**, source, and build-system sub-trees within it. (Refine MAY persist plan-scoped artifacts under the worktree's .plan/ symlink, but MUST NOT edit the worktree's checked-out source tree — that surface belongs to phase-5-execute.)Every other path is forbidden. The orchestrator's post-dispatch main-checkout assertion (see plan-marshall:plan-marshall:planning.md § "2-Refine Phase" → "Post-dispatch contract assertion") detects violations structurally; the plugin-doctor REFINE_CONTRACT_VIOLATION analyzer detects them at edit time.
Constraints:
.plan/execute-script.py calls
manage-*scripts (Bucket A) resolve.plan/viagit rev-parse --git-common-dirand work from any cwd — do NOT pin cwd, do NOT pass routing flags, and never useenv -C. Build / CI / Sonar scripts (Bucket B) accept--plan-id {plan_id}(preferred — auto-resolves the worktree viamanage-status get-worktree-path) or--project-dir {worktree_path}(explicit override / escape hatch); the two flags are mutually exclusive. Seeplan-marshall:tools-script-executor/standards/cwd-policy.md.
Before creating deliverables (phase-3-outline), ensure the request is:
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| plan_id | string | Yes | Plan identifier |
The Phase Entry Protocol's phase_handshake verify --phase 1-init --strict call (see ref-workflow-architecture/standards/phase-lifecycle.md) asserts the tri-state worktree-resolution contract before any phase-2-refine work begins. When metadata.use_worktree==true AND metadata.worktree_path is empty, the assertion treats this as the deferred-materialization window and passes — phases 2-3-4 run on the main checkout / current feature-branch intent until phase-5-execute Step 2.5 materializes the artifacts. The strict path-not-found / path-stale failures still fire when worktree_path is set but does not resolve cleanly. Phases 5-6 retain the original strict semantics. Plans with metadata.use_worktree==false skip the assertion (main-checkout flow). See workflow-integration-git/standards/worktree-handling.md for the canonical lifecycle contract and the underlying _resolve_worktree_assertion implementation in phase_handshake.py.
This phase dispatches under one role key: phase-2-refine (resolves through phase-2-refine.default). The confidence loop (Steps 3b/3c/8/9/10/11/12) iterates inside one dispatch envelope; the orchestrator never spawns per-iteration subagents. Mechanical sub-procedures stay inline: Step 3d baseline reconciliation runs via the workflow-integration-git:baseline-reconcile script (LLM-bearing classification is bundled into phase-2-refine); Step 10 confidence aggregation runs via the manage-status:aggregate-confidence script. Step 13.5 q-gate-validation activation is signaled by setting qgate_validation_required: true in the phase return TOON; the orchestrator (plan-marshall:plan-marshall/workflow/planning.md) reads that flag and issues q-gate-validation as a sibling top-level Task: plan-marshall:{target} dispatch — the phase body cannot spawn it directly because the Task tool is unavailable inside an execution-context-{level} subagent. For the rationale see dispatch-granularity.md § 3 (Heuristic 2 — bundle when steps share context).
The confidence loop (Steps 3b/3c/8/9/10/11/12) re-evaluates classification, source-premise verification, and confidence aggregation across iterations — but the inputs feeding those re-evaluations are loop-invariant: they are written before the loop begins (phase-1-init, phase-2-refine entry) and are not mutated by the loop body. The dispatched agent MUST read each of the following inputs ONCE at phase entry and reference the cached values throughout every loop iteration:
request.md — both clarified_request and original_input sections (read via manage-plan-documents read --plan-id {plan_id} --document request).references.json — domains, base_branch, worktree_path, affected_files, change_type (read via manage-files read --plan-id {plan_id} --file references.json).module_mapping.toon if present at .plan/local/plans/{plan_id}/module_mapping.toon (read via manage-files read).manage-architecture topology at phase entry).Prohibited actions:
See extension-api/standards/dispatch-granularity.md § 5.1 (Heuristic 2 — bundle when steps share context) for the granularity rationale.
The refine phase executes Steps 1-14 (with optional Steps 3b and 3c). Steps 8-12 form an iterative loop that repeats until confidence reaches the threshold.
On re-entry, address pending Q-Gate findings before re-running analysis. Query with manage-findings qgate list --phase 2-refine --resolution pending, resolve each finding, then continue with Steps 4-14.
Log [STATUS] Starting refine phase to work.log.
Recipe-sourced plans skip quality analysis entirely. Check plan_source metadata; if recipe, force track=complex, set confidence=100, transition phase, and return immediately. Otherwise continue with Steps 3b-14.
Verify code references in the request narrative against the current codebase before quality analysis. Activates when the request contains verifiable code references (file paths, flags, API names, behavior descriptions). Findings feed into the Correctness dimension in Step 8/10.
For the complete verification procedure, see source-premise-verification.md.
Challenge whether a proposed fix actually solves the documented symptom before confidence aggregation. Activates via semantic LLM judgment when the request narrative proposes a specific code change (command, regex, function body, config edit) — source-agnostic, not gated on header tokens. Constructs a synthetic "would the proposed fix change behavior in the failure scenario?" probe and emits CORRECTNESS: ISSUE — Proposed fix incomplete when the probe exposes a gap. Findings feed the same Correctness dimension as Step 3b.
For the complete procedure (extraction, probe construction, result handling, worked example), see proposed-fix-verification.md.
Sync the target branch and surface overlapping diffs before quality analysis runs against an outdated main. Activates whenever the plan has a configured base branch (the default flow). The reconcile script classifies the upstream drift into three outcomes:
no_overlap — upstream commits touch disjoint files. Fast-path: continue without findings.overlap_no_content_conflict (focused reconcile) — upstream commits touch overlapping files but git merge-tree reports zero content conflicts. The script performs a focused git merge origin/{base_branch} against the worktree, surfaces ANY real conflicts that arise during the merge, and resolves the drift in-place without re-entering the iterate-to-confidence loop. When the merge succeeds cleanly, the auto-resolved drift produces no finding.overlap_with_content_conflict — git merge-tree reports content conflicts. Emits Q-Gate findings that feed back into Steps 8-12 (the iterate-to-confidence loop is the right place to absorb baseline shifts that cannot be merged mechanically).Skipped silently for main-checkout flow (metadata.use_worktree=false) and when no base branch is configured.
For the complete procedure (sync invocation, diff surfacing, finding-emission contract, three-way classification, focused reconcile/rebase routing), see refine-workflow-detail.md § Step 3d.
Read confidence_threshold from project config (manage-config plan phase-2-refine get --field confidence_threshold). Default: 95.
Read fast_path_threshold from project config (manage-config plan phase-2-refine get --field fast_path_threshold). Default: 100. Read-only during refine — the same set/init/sync-defaults/sync-plan-defaults prohibition defined in the Enforcement section above applies. Store as fast_path_threshold for use in Step 10.
Read compatibility from project config. Valid values:
| Value | Description |
|-------|-------------|
| breaking | Clean-slate approach, no deprecation nor transitionary comments |
| deprecation | Add deprecation markers to old code, provide migration path |
| smart_and_ask | Assess impact and ask user when backward compatibility is uncertain |
No fallback -- fail with error if not configured.
Read simplicity from project config (manage-config plan phase-2-refine get --field simplicity). The knob mirrors compatibility: it tunes how aggressively implementation tasks favour the minimum viable surface over speculative structure. Valid values:
| Value | Description |
|-------|-------------|
| lean | Implement the strict minimum; remove or inline surplus structure. Default. |
| pragmatic | Prefer minimal, but keep low-risk structure that aids readability. |
| defensive | Retain belt-and-suspenders structure (guards, abstraction seams) where the outcome is uncertain. |
The enforcement-critical anti-pattern catalogue lives in the central standard at dev-general-code-quality/standards/code-organization.md #minimum-viable-code and the agent-facing principle at dev-agent-behavior-rules/standards/agent-behavior-rules.md (Principle 7); it is intentionally not duplicated here. Default lean when unconfigured — unlike compatibility, the simplicity knob defaults rather than failing, so existing plans without the key behave as lean.
Query architecture with manage-architecture architecture info. Extract project_name, project_description, technologies, module_names, and module_purposes into arch_context for use in Steps 8-9. Abort if architecture not found.
Load request document with manage-plan-documents request read. Extract title, description, clarifications, and clarified_request.
Evaluate the request against five quality dimensions using arch_context:
| Dimension | Checks | Finding Format |
|-----------|--------|----------------|
| Correctness | Technology/module/API/pattern validity against architecture | CORRECTNESS: {PASS\|ISSUE} |
| Completeness | Scope clarity, success criteria, test requirements, dependencies | COMPLETENESS: {PASS\|MISSING} |
| Consistency | No contradictions, aligned constraints, coherent scope | CONSISTENCY: {PASS\|CONFLICT} |
| Non-Duplication | No repeated or overlapping requirements | DUPLICATION: {PASS\|REDUNDANT} |
| Ambiguity | Clear terminology, specific scope, measurable criteria, analysis intent | AMBIGUITY: {PASS\|UNCLEAR} |
Four sub-analyses using arch_context:
Module Mapping: Identify which modules are affected. Use architecture module for detailed info when confidence < 70%, architecture graph for cross-module changes.
Feasibility Check: Validate request against module boundaries, dependency direction, extension points, and technology fit.
Scope Size Estimation: Derive scope_estimate from the module_mapping using the standard derivation helper (see standards/refine-workflow-detail.md Step 9 — Derivation Rules). Allowed values: none | surgical | single_module | multi_module | broad. The same enum and rule of thumb is documented in manage-solution-outline:standards/solution-outline-standard.md so the value flows unchanged into the solution outline. Persist the derived value to references.json via manage-references set --field scope_estimate and include it in the Step 13 return TOON.
Track Selection: Determine simple vs complex track using hard-gate triggers:
Complex Track triggers (hard gates, OR logic):
[T1] scope_estimate is multi_module or broad
[T2] Request contains scope words (all, every, migrate, refactor, etc.)
[T3] module_mapping uses patterns/globs instead of explicit file paths
[T4] Domain requires discovery (plugin-dev, documentation, requirements)
— skipped via escape hatch when explicit paths + narrow scope
If ANY trigger fires → track = complex
If NONE fire AND S1+S2+S3 all true → track = simple
Otherwise → track = complex
Aggregate the per-dimension scores from Steps 8 / 9 into a single weighted confidence via the deterministic aggregator:
python3 .plan/execute-script.py plan-marshall:manage-status:manage-status \
aggregate-confidence --plan-id {plan_id} \
--correctness {N} --completeness {N} --consistency {N} \
--non-duplication {N} --ambiguity {N} --module-mapping {N} \
--persist
The dimension weights are fixed (no LLM judgement remains in this step):
| Dimension | Weight | |-----------|--------| | Correctness | 20% | | Completeness | 20% | | Consistency | 20% | | Non-Duplication | 10% | | Ambiguity | 20% | | Module Mapping | 10% |
For batch input, the analyzer can stage the per-dimension scores as JSON at .plan/local/plans/{plan_id}/work/confidence-scores.json and pass --scores-file {path} instead of individual flags. Missing dimensions default to 0 and surface in missing_dimensions so the caller can detect a malformed analyzer return.
The script returns {confidence, breakdown[]{dimension, score, weight, weighted}, missing_dimensions, persisted}; with --persist, the overall confidence also lands in status.metadata.confidence so phase-3-outline and downstream consumers can read it without re-running the math.
First-pass fast path (distinct from the loop-exit rule below): on the first analysis pass only (iteration_count == 1), if confidence >= fast_path_threshold, skip the clarification loop (Steps 11-12) entirely and proceed directly to Step 13. The fast path is a stricter, first-pass-only gate that decides whether the loop is entered at all. On every subsequent iteration only the confidence_threshold loop-exit semantics below apply.
If confidence >= confidence_threshold → Step 13. Otherwise → Step 11.
Formulate clarification questions from issues found in Steps 8-9. Use AskUserQuestion with specific options. At most 4 questions per iteration, prioritized: Correctness > Consistency > Completeness > Ambiguity > Duplication.
Record clarifications via the three-step path-allocate flow: (1) call manage-plan-documents request path to get the canonical artifact path, (2) use Edit/Write to update the ## Clarifications and ## Clarified Request sections directly in that file, (3) call manage-plan-documents request mark-clarified to record the transition. Synthesize an updated request if significant clarifications were made. Loop back to Step 8. See standards/refine-workflow-detail.md Step 12 for the full procedure.
When confidence reaches threshold:
work/module_mapping.toonscope_estimate to references.json via manage-references set --field scope_estimate --value {scope_estimate} (one of none | surgical | single_module | multi_module | broad)status.json reports plan_source set to a non-recipe value (i.e., plan_source is present and not the literal string recipe), the phase sets qgate_validation_required: true in its return TOON so the orchestrator (plan-marshall:plan-marshall/workflow/planning.md) dispatches plan-marshall:plan-marshall/workflow/q-gate-validation.md as a sibling top-level Task after the phase returns. The phase body cannot dispatch q-gate-validation itself because the Task tool is unavailable inside an execution-context-{level} subagent. Lesson-derived plans encode the source lesson id directly in plan_source (e.g., 2026-05-11-08-004), so the guard MUST treat any non-null, non-recipe value as lesson-derived. The orchestrator aggregates the validator's qgate_pending_count into the phase's running count before re-evaluating the existing 3-iteration auto-loop predicate. See refine-workflow-detail.md Step 13.5 for the activation-guard contract. The flag is false when plan_source is absent or equals recipe.status: success
plan_id: {plan_id}
confidence: {achieved_confidence}
track: {simple|complex}
track_reasoning: {track_reasoning}
scope_estimate: {scope_estimate}
compatibility: {compatibility}
compatibility_description: {compatibility_description}
simplicity: {simplicity}
simplicity_description: {simplicity_description}
domains: [{detected domains}]
qgate_pending_count: {0 if no findings}
qgate_validation_required: {true|false}
qgate_validation_required is true when the phase decided q-gate-validation must run (lesson-derived plan path activated at Step 13.5 — plan_source is set and not recipe), and false otherwise. The orchestrator (plan-marshall:plan-marshall/workflow/planning.md) reads this flag after the phase returns and dispatches q-gate-validation as a sibling top-level Task when it is true.
Data Location Reference:
decision.log filtered by (plan-marshall:phase-2-refine)work/module_mapping.toonrequest.md → clarifications, clarified_requestTransition from refine to outline with manage-status transition --completed 2-refine. Log completion and add visual separator.
Step 13.6 (above) is the single source of truth for the return TOON. The minimum contract every workflow doc that implements ext-point-execution-context-workflow MUST return is:
status: success | error
display_detail: "<{confidence}% confidence, track {track}, {qgate_pending_count} pending>"
display_detail shape on success: "{confidence}% confidence, track {track}, {qgate_pending_count} pending" (e.g. "92% confidence, track complex, 0 pending"); ≤80 chars, ASCII, no trailing period. On error, carries the short error label from § Error Handling.
All other fields (plan_id, confidence, track, track_reasoning, scope_estimate, compatibility, compatibility_description, simplicity, simplicity_description, domains, qgate_pending_count) are documented in Step 13.6 above.
| Error | Action |
|-------|--------|
| Architecture not found | Return {status: error, message: "Run /marshall-steward first"} and abort |
| Compatibility not configured | Return {status: error, message: "compatibility not configured. Run /marshall-steward first"} and abort |
| Request not found | Return {status: error, message: "Request document missing"} |
| Max iterations reached (5) | Return with current confidence, flag for manual review |
This skill does not invoke manage-metrics itself. The orchestrator
(plan-marshall:plan-marshall workflows) records the 2-refine → 3-outline
boundary via the fused manage-metrics phase-boundary call — see
marketplace/bundles/plan-marshall/skills/manage-metrics/SKILL.md §
phase-boundary for the API.
Invoked by: plan-marshall:plan-marshall skill (loaded directly in main context for user interaction)
Script Notations (use EXACTLY as shown):
plan-marshall:manage-architecture:architecture - Architecture queriesplan-marshall:manage-plan-documents:manage-plan-documents - Request operationsplan-marshall:manage-references:manage-references - References persistence (track, scope, module_mapping, compatibility)plan-marshall:manage-findings:manage-findings - Q-Gate findings (qgate add/query/resolve)plan-marshall:manage-logging:manage-logging - Work and decision loggingplan-marshall:manage-config:manage-config - Project config (threshold, compatibility)plan-marshall:manage-status:manage-status - Phase transition and lifecycle managementPersistence Locations:
work/module_mapping.toon: Module mapping analysis statedecision.log: Track/scope decisions, config reads, domain detectionwork.log: Workflow progress (REFINE:N entries)request.md: clarifications, clarified_requestConsumed By:
plan-marshall:phase-3-outline skill (receives track/scope/compatibility in return output; reads module_mapping from work/)testing
A test skill for README generation
testing
A test skill with existing references
tools
Skill without references directory
development
Test skill with table-format references