marketplace/bundles/plan-marshall/skills/execute-task/SKILL.md
Execute a single plan task with profile-based workflow selection (implementation, module_testing, verification)
npx skillsauth add cuioss/plan-marshall execute-taskInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Role: Unified, domain-agnostic workflow skill for executing tasks during phase-5-execute. Handles all profiles: implementation, module_testing, and verification. Loaded by plan-marshall:phase-5-execute when executing any task.
Key Pattern: Agent loads this skill via resolve-execute-task-skill --profile {profile}. Skill reads the task's profile and follows the appropriate workflow. Domain-specific knowledge comes from task.skills (loaded by agent).
Base Contract: This skill follows the execute-task skill contract defined in execute-task-skills.md for input/output contracts, error handling, and script notations.
Skill: plan-marshall:dev-agent-behavior-rules
Prohibited actions:
manage-status get-worktree-path --plan-id {plan_id} (resolved internally from the Input Contract below); every Edit/Write/Read tool call during task execution MUST resolve against the returned path.cd {worktree_path} && git ... compound. All git commands during task execution MUST use the git -C {resolved_worktree_path} <subcommand> form, where {resolved_worktree_path} is the value returned by manage-status get-worktree-path --plan-id {plan_id}.&&, ;, &, or newlines in a single Bash tool call. Each Bash tool call MUST contain exactly ONE command. The compound form trips the host platform's permission UI and produces silent swallowing of intermediate exit codes — both are load-bearing failure modes during task execution.for/while/until loops, $() substitution, subshells, heredocs with # lines) inside a Bash tool call. Poll conditions belong in a Monitor tool call or are eliminated by running commands synchronously with an explicit timeout. Polling loops trip the host platform's security heuristics and are a structural signal that the verification step is wrong.command &, then wait or sleep loops) to track a running step. Run verification commands synchronously via Bash with timeout set high enough; use run_in_background: true only when the task description explicitly requires a background process AND the step does not need to read the result.2>/dev/null, || true, || :, -q flags that hide exit codes). Every Bash tool call MUST surface its exit code cleanly so the verification loop can detect failures.Constraints:
implementation or module_testing profile MUST NOT be marked done until the resolved canonical command (quality-gate or verify respectively) exits cleanly. Module-tests passing alone is necessary but not sufficient — mypy and ruff must also pass.[ATTEMPT] work-log line that names the command being run. Use: python3 .plan/execute-script.py plan-marshall:manage-logging:manage-logging work --plan-id {plan_id} --level INFO --message "[ATTEMPT] (plan-marshall:execute-task) {short description of command}". This provides an auditable trail when a Bash call hangs or produces unexpected output.See workflow-integration-git/standards/worktree-handling.md for the worktree-specific application of this rule (never-edit-main-checkout invariant, git -C rule, dispatch header propagation).
Every invocation of this skill MUST provide the following inputs:
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| plan_id | string | Yes | Plan identifier. Used by all manage-* script calls AND as the worktree-binding token: the skill resolves the active worktree internally via plan-marshall:manage-status:manage-status get-worktree-path --plan-id {plan_id}. The resolved path is the mandatory root for every Edit/Write/Read tool call during this task. When the plan runs against the main checkout (metadata.use_worktree == false), the resolved path is empty and operations target the main checkout directly. See workflow-integration-git/standards/worktree-handling.md for the canonical --plan-id two-state binding. |
| task_number | number | Yes | Numeric task id to execute |
| worktree_path | string | Deprecated | Deprecated — kept only for backward compatibility with callers that still pass an absolute path. New callers MUST forward only plan_id. When supplied, the value MUST agree with the path resolved from plan_id; treat any disagreement as fail-loud. |
Callers (typically execution-context-{level} dispatching this skill via the phase-5-execute role key) MUST forward plan_id verbatim — no absolute-path forwarding is required. Child subagent dispatches issued from within this skill MUST echo the path-free Worktree Header verbatim into their own prompts (template: WORKTREE: --plan-id {plan_id} plus the resolution-and-rationale block defined in plan-marshall:phase-5-execute § Dispatch Protocol).
All profiles share the steps below. Profile-specific steps are documented in each profile section.
Check if a rename mapping exists and rewrite step targets before loading the task. This handles cases where earlier tasks renamed directories, making subsequent step targets stale.
python3 .plan/execute-script.py plan-marshall:manage-files:manage-files exists \
--plan-id {plan_id} --file work/rename_mapping.toon
If exists: true, the rename mapping has already been applied to task step targets at recording time (by the rename-path subcommand). No further action needed — proceed to Load Task Context. The mapping file serves as an audit trail of path changes during the plan.
After resolving stale targets and BEFORE loading the task context, inspect every pending step.target on the incoming task. If a target matches the regex test/.+/conftest\.py$ (a sibling conftest.py nested under a skill test directory) AND the full path is NOT in the allow-list below, rewrite the target in-place to the sibling _fixtures.py before execution.
Allow-list (these paths are the canonical top-level conftests and MUST NOT be rewritten):
test/conftest.pytest/adapters/conftest.pyRewrite rule: Replace the trailing conftest.py segment with _fixtures.py, keeping the parent directory unchanged. For example, test/plan-marshall/execute-task/conftest.py becomes test/plan-marshall/execute-task/_fixtures.py.
Decision log requirement: For each rewrite, emit a decision.log entry via plan-marshall:manage-logging:manage-logging using the exact command below:
python3 .plan/execute-script.py plan-marshall:manage-logging:manage-logging \
decision --plan-id {plan_id} --level INFO \
--message "(plan-marshall:execute-task) Deviation: rewrote {original_path} → {rewritten_path} (reason: sibling conftest.py would shadow top-level test/conftest.py)"
Rationale: A sibling conftest.py placed under test/<bundle>/<skill>/ is auto-loaded by pytest and will shadow the top-level test/conftest.py, silently disabling shared fixtures and producing misleading green runs. The canonical convention is a sibling _fixtures.py imported explicitly where needed. See plan-marshall:dev-general-module-testing for the authoritative _fixtures.py convention and the reasoning behind the allow-list.
python3 .plan/execute-script.py plan-marshall:manage-tasks:manage-tasks read \
--plan-id {plan_id} --task-number {task_number}
Extract key fields: domain, profile, skills, description, steps, verification, depends_on. Verify profile matches the expected profile for this execution.
After completing each step:
python3 .plan/execute-script.py plan-marshall:manage-tasks:manage-tasks finalize-step \
--plan-id {plan_id} --task-number {task_number} --step {N} --outcome done
After all steps complete, run task verification using commands from task.verification.commands.
Sub-step: Auto-inject --plan-id for Bucket B commands
When the plan resolves to an active worktree, before executing any task.verification.commands[N], route the command through the injection helper, passing --plan-id {plan_id} directly:
python3 .plan/execute-script.py plan-marshall:execute-task:inject_project_dir \
run --command "{verification_command}" --plan-id {plan_id}
Injecting --plan-id (rather than --project-dir {worktree_path}) routes the executor's two-tier audit-log entry to the plan-scoped .plan/local/plans/{plan_id}/logs/script-execution.log — the log the pre-commit-verify-freshness gate reads — and lets the Bucket B script auto-resolve the worktree path itself via its --plan-id/--project-dir two-state contract. No separate get-worktree-path resolution is required.
Parse the TOON output from the script's stdout. Use the rewritten_command value as the command to execute. When injected is true, log:
python3 .plan/execute-script.py plan-marshall:manage-logging:manage-logging \
work --plan-id {plan_id} --level INFO \
--message "[VERIFY] (plan-marshall:execute-task) Auto-injected --plan-id for {notation} (routes build log to plan-scoped script-execution.log)"
The helper whitelists the eight Bucket B notations from plan-marshall:tools-script-executor/standards/cwd-policy.md; Bucket A manage-* notations and unknown notations pass through unchanged. The helper skips injection when the command already supplies --plan-id (no double injection) and when it already supplies an explicit --project-dir (a legacy override is respected untouched). See scripts/inject_project_dir.py for the authoritative whitelist.
Safety net (should not trigger in normal operation): If verification commands are missing, log a WARNING and resolve from architecture:
python3 .plan/execute-script.py plan-marshall:manage-logging:manage-logging \
work --plan-id {plan_id} --level WARNING --message "[VERIFY] (plan-marshall:execute-task) TASK-{N} missing verification — falling back to architecture resolve"
python3 .plan/execute-script.py plan-marshall:manage-architecture:architecture \
resolve --command {resolve_command} --module {module} \
--audit-plan-id {plan_id}
Where {resolve_command} depends on profile: implementation → quality-gate, module_testing → verify. Verification profile uses commands from task steps directly.
If verification passes: Mark task done via manage-tasks update --status done.
If verification fails:
verification_max_iterations from config, default 5)If still failing after max iterations: mark task as blocked and record details in work.log.
Before recording any per-task verification deviation that would soften a request-level hard requirement, this step MUST raise an AskUserQuestion per the canonical contract in ../ref-workflow-architecture/standards/scope-deviation-escalation.md. The standard is the single source of truth for the deviation taxonomy, the three-option AskUserQuestion shape (Hold / Accept-with-rationale / Split), and the prohibited "log-and-continue" anti-pattern.
Detection: A deviation softens a hard requirement when, mid-fix-iteration, the implementor concludes that satisfying the request as written is structurally riskier than estimated and the conservative response would be to keep both the old and new surfaces. Concrete signals: verification command reports a non-zero hit count against a "zero-hit grep" gate; tests pass against a legacy code path the deliverable was meant to delete; the implementor is about to add a transition hedge ("until X has fully landed", "callers may still see Y") to satisfy the test gate.
Integration shape: This guard mirrors the per-task compatibility AskUserQuestion already used by the smart_and_ask Compatibility Strategy in the implementation profile (Step: Compatibility Strategy above). Both guards are AskUserQuestion gates that fire BEFORE the deviation is committed; both pause the per-task fix loop until the user resolves the prompt; both persist the resolution to decision.log. The new guard adds one element the compatibility AskUserQuestion does not: when the user chooses "Accept with rationale", the rationale text is also surfaced in the PR body (the compatibility AskUserQuestion does not require this because compatibility decisions land in commit messages instead).
Resolution: On user resolution, follow the side-effect contract in scope-deviation-escalation.md. The [VERIFY], [STATUS], or [OUTCOME] work-log line confirming the user's chosen option IS allowed AFTER the AskUserQuestion has resolved — never as a stand-in for it. When the user chooses "Hold the line", the per-task fix loop resumes (do NOT mark the task blocked on the same iteration just because the user refused the softening).
On issues or unexpected patterns, first run the canonical three-gate lesson-creation policy in ../manage-lessons/standards/lesson-creation-policy.md — Gate 1 (dedup), Gate 2 (active-plan check), Gate 3 (create). The two-step path-allocate flow below is Gate 3, reached only when Gates 1 and 2 both clear; when Gate 1 returns merge_into / already_closed or Gate 2 finds a covering active plan, extend the existing lesson or fold into the plan instead of allocating a new one. Do not restate the gate mechanics — follow the standard.
When the gates clear, use the two-step path-allocate flow:
path:python3 .plan/execute-script.py plan-marshall:manage-lessons:manage-lessons add \
--component "plan-marshall:execute-task" --category improvement \
--title "{issue summary}"
path from the output, then write the lesson body directly to that path via the Write tool. Markdown sections with ## headings, code fences, and multiple paragraphs are all safe because the body never passes through a shell argument.Base output contract (profile-specific extensions noted in each section):
status: success | error
plan_id: {echo}
task_number: {echo}
execution_summary:
steps_completed: N
steps_total: M
files_modified: [paths]
verification:
passed: true | false
command: "{cmd}"
next_action: task_complete | requires_attention
message: {error message if status=error}
Production code creation and modification.
When the plan runs in an isolated worktree (resolvable via plan-marshall:manage-status:manage-status get-worktree-path --plan-id {plan_id} returning a non-empty path), every Edit/Write/Read tool call in this profile MUST resolve its file path against the returned path. If a subagent is dispatched from this profile, embed the path-free Worktree Header (WORKTREE: --plan-id {plan_id} plus the resolution-and-rationale block — see phase-5-execute § Dispatch Protocol) so the child propagates the constraint without leaking the absolute path. The auto-injection sub-step under Common Workflow → Step: Run Verification handles Bucket B forwarding structurally; Bucket A manage-* scripts remain cwd-agnostic. See workflow-integration-git/standards/worktree-handling.md for the canonical --plan-id two-state binding and plan-marshall:tools-script-executor/standards/cwd-policy.md for the Bucket A/B split.
Before implementing, read the compatibility approach:
python3 .plan/execute-script.py plan-marshall:manage-config:manage-config \
plan phase-2-refine get --field compatibility --audit-plan-id {plan_id}
No fallback — if field not found, fail with error and abort task.
Apply throughout all subsequent steps:
@Deprecated markers. Add new code alongside old. Provide migration notes in commit messages.step.target) if they exist. Use architecture files --module {module} to enumerate the module's components and architecture which-module --path P / architecture find --pattern P for module-spanning lookups; fall back to Grep for content-level searches inside known files and Glob for sub-module path patterns or when the architecture verb returns elision. Apply domain knowledge from loaded skills.Write, modify existing files with Edit. Apply domain patterns and maintain existing code style.quality-gate (static analysis — mypy + ruff on production sources; full tests belong to module_testing)| Failure | Action | |---------|--------| | Conflicting changes | Analyze conflict, prefer preserving existing behavior, ask for clarification if needed |
Unit and module test creation.
When the plan runs in an isolated worktree (resolvable via plan-marshall:manage-status:manage-status get-worktree-path --plan-id {plan_id} returning a non-empty path), every Edit/Write/Read tool call in this profile MUST resolve its file path against the returned path. If a subagent is dispatched from this profile, embed the path-free Worktree Header (WORKTREE: --plan-id {plan_id} plus the resolution-and-rationale block — see phase-5-execute § Dispatch Protocol) so the child propagates the constraint without leaking the absolute path. The auto-injection sub-step under Common Workflow → Step: Run Verification handles Bucket B forwarding structurally; Bucket A manage-* scripts remain cwd-agnostic. See workflow-integration-git/standards/worktree-handling.md for the canonical --plan-id two-state binding and plan-marshall:tools-script-executor/standards/cwd-policy.md for the Bucket A/B split.
Understand Implementation Context: Use architecture files --module {module} to enumerate the module's components and architecture find --pattern '*{name}*' for module-spanning lookups; fall back to Grep for content-level searches inside known files and Glob for sub-module path patterns or when the architecture verb returns elision. Read and examine the matched implementation files. Identify testable elements: public methods, edge cases, error conditions, input validation, integration points.
Plan Test Implementation: For each step, determine test scenarios, test structure (unit vs integration per domain skills), assertions needed, setup/teardown requirements.
Implement Tests: For each step — create new test files with Write, modify existing test files with Edit. Follow the AAA pattern (Arrange-Act-Assert). Include positive and negative test cases with descriptive names.
Mark Step Complete (common step)
Run Verification — resolve command: verify (full verify pipeline for the module — quality-gate + module-tests)
Handle Verification Results:
Sub-step: Diff written test identifiers against the module-test log
After a green module-tests run that produced new test files during step 3 (Implement Tests):
{rel_path}::{test_function} for each test function. Write them to a temp file under .plan/temp/ (one identifier per line).python3 .plan/execute-script.py plan-marshall:execute-task:assert_test_identifiers \
run --identifiers-file "{temp_identifiers_file}" --log "{module_test_log_path}"
passed: true: proceed to the standard "Mark task done" path.passed: false: do NOT mark the task done — the run was silently incomplete. Log, mark the task requires_attention, and surface the mismatch in the return value:python3 .plan/execute-script.py plan-marshall:manage-logging:manage-logging \
work --plan-id {plan_id} --level WARNING \
--message "[VERIFY] (plan-marshall:execute-task) Diff assertion failed: {missing_count} written test identifiers absent from module-test log — {missing}"
On test failure: determine if test logic is wrong or implementation has a bug. If test logic → fix test. If implementation bug → fix production code AND the test. Adapting production code to make tests pass is expected within this profile.
Record Lessons, Return Results
execution_summary:
tests_written: N
coverage_impact: {if available}
verification:
tests_passed: N
tests_failed: N
diff_assertion:
passed: true | false
missing_count: N
missing[]: [identifier, ...]
Note: diff_assertion.passed: false overrides tests_passed — a green test count does not imply a successful run if written identifiers are absent from the log.
| Failure | Action | |---------|--------| | Implementation not found | Check if implementation task is in dependencies. If yes → mark task as blocked. If no → note in lessons learned |
Run verification commands without modifying files.
Note: The verification profile is distinct from the verification change-type (see change-types.md). The profile determines HOW a task executes; the change-type describes WHY a request was made.
Resolve Worktree Path: Before executing any verification step, resolve the active worktree path from plan_id:
python3 .plan/execute-script.py plan-marshall:manage-status:manage-status get-worktree-path \
--plan-id {plan_id}
Capture the returned worktree_path. When metadata.use_worktree == false the returned path is empty — skip the auto-injection sub-step below and execute each raw step.target directly against the main checkout.
Execute Verification Steps with --plan-id Auto-Injection: Steps contain verification commands (not file paths). Execute sequentially. For each step.target:
a. Route the command through the injection helper, passing --plan-id {plan_id}:
python3 .plan/execute-script.py plan-marshall:execute-task:inject_project_dir \
run --command "{step.target}" --plan-id {plan_id}
Parse the TOON output. Use the rewritten_command value as the command to execute. When the worktree path resolved in Step 1 is empty (main-checkout flow), skip the helper and execute the raw step.target. Injecting --plan-id routes the executor's audit-log entry to the plan-scoped script-execution.log and lets the Bucket B script auto-resolve the worktree via its two-state contract.
b. Execute the resulting command with a Bash timeout derived from the architecture-resolved canonical envelope. See plan-marshall:dev-agent-behavior-rules § "Bash: Timeout from architecture-resolved canonical command" for the authoritative rule: read bash_timeout_seconds and execution_tier from the resolved TOON, pass timeout: bash_timeout_seconds * 1000 when execution_tier=per_task, and hand off to the orchestrator when execution_tier=orchestrator. The 600000ms floor still applies to ad-hoc invocations that do not flow through architecture resolve, and matches CLAUDE.md § Build Commands.
c. On injected: true, emit the standard auto-injection work-log entry:
python3 .plan/execute-script.py plan-marshall:manage-logging:manage-logging \
work --plan-id {plan_id} --level INFO \
--message "[VERIFY] (plan-marshall:execute-task) Auto-injected --plan-id for {notation} (routes build log to plan-scoped script-execution.log)"
Check exit code and output of the executed command. This step mirrors the Common Workflow → Step: Run Verification sub-step; see that section for the authoritative whitelist of Bucket B notations, the no-inject pass-through rule for Bucket A manage-* notations, and the rationale for skipping injection when the command already supplies --plan-id or an explicit --project-dir.
Mark Step Complete (common step)
Handle Failures: This is a verification task — do NOT modify source files. Report failures with structured output for triage. If verification fails, mark task as blocked.
Record Lessons (on unexpected failures or environment issues), Return Results
No domain skills are needed for this profile.
execution_summary:
commands_run: [commands]
verification:
exit_code: {exit_code}
stderr: "{truncated stderr, max 2000 chars}"
findings:
- type: {compile-error|test-failure|lint-issue}
file: {file_path}
line: {line_number}
message: "{error message}"
On success: next_action: task_complete.
On failure: status: error, next_action: requires_attention, plus the extension fields. The findings array is best-effort: parse compiler errors, test failures, or lint output into structured entries. If parsing fails, include the raw stderr.
The canonical argparse surface for the two entry-point scripts this skill registers: inject_project_dir.py and assert_test_identifiers.py. The plugin-doctor analyzer (_analyze_manage_invocation.py) reads this section as source-of-truth for the manage-invocation-invalid and missing-canonical-block rules. Consuming docs xref this section by name instead of restating the command inline. See pm-plugin-development:plugin-script-architecture cross-skill-integration.md § "Script invocation in documentation".
python3 .plan/execute-script.py plan-marshall:execute-task:inject_project_dir run \
--command COMMAND --plan-id PLAN_ID
python3 .plan/execute-script.py plan-marshall:execute-task:assert_test_identifiers run \
--identifiers-file IDENTIFIERS_FILE --log LOG
testing
A test skill for README generation
testing
A test skill with existing references
tools
Skill without references directory
development
Test skill with table-format references