cellm/skills/execute/SKILL.md
Mandatory execution gate after spec decomposition — analyzes DAG structure, computes per-phase risk scores, selects optimal strategy (implement vs orchestrate vs orchestrate-teams vs swarm vs spec-treat), presents M1/M2/M3 approval menus via AskUserQuestion, and orchestrates with go/no-go gates between every phase. Handles partial phases, approval tickets, planner fallbacks, and degradation scenarios. Use when: 'execute spec', 'run spec', 'execute', 'execute check', 'best strategy for spec', 'after plan-to-spec', 'choose between implement and orchestrate', 'risk scoring', 'how to run this decomposed spec', 'start execution', 'which executor for this phase'. Do NOT use for: implementing a single task (cellm:implement), decomposing a plan (cellm:plan-to-spec), running certification (cellm:olympus/arena/convergir), or checking spec status (cellm:spec).
npx skillsauth add murillodutt/cellm cellm/skills/executeInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Analyze a decomposed spec and propose the optimal execution strategy per phase, then orchestrate approved plan with quality gates between every stage.
active, incremental API vs bulk decomposeSpec): see docs/technical/SPEC-DECOMPOSE-LIFECYCLE.md. Execution assumes a decomposed tree; it does not replace owner approval of the plan before decomposition.context_preflight before analysis (flow='orchestrate').spec_get_tree + spec_get_counters as single source of truth.check.body.approvalTicket (scope=decompose+execute-stage2) and strict checks (same session, TTL valid, fingerprint match, non-critical priority).check.body.guardrailsContract as canonical execution policy (writer: plan-to-spec).
docs/technical/guardrails-contract-v1.md.critical/high: fail-closed (block and escalate to user).medium/low: continue with safe defaults (balanced + blocker-only hard stop list) and emit telemetry warning.go_no_go_evaluate(phase_exit) between execution steps.go_no_go_evaluate(check_exit) before declaring check complete.go_no_go_evaluate, call go_no_go_record to persist the verdict. Render the decision matrix via go_no_go_render in inter-stage reports.context_record_outcome after each step. Use deterministic key: {checkId}/{phaseId}/step-{n} for idempotent writes.context_record_outcome with deterministic keys (approval_prompt_count, approval_prompt_skipped, approval_ticket_reused, approval_ticket_rejected).asclepius -> re-evaluate per step. If still no_go after 2 retries, escalate to user with full context.go_no_go_evaluate itself fails (API error, timeout), BLOCK advancement and ask user for explicit decision. Never assume go on evaluation failure.throughput: no proactive confirmations during Stage 3; escalate only hard blockers.balanced: max 1 objective escalation per phase.conservative: confirmations per step are allowed by design.spec_search.spec_get_tree (yaml format) + spec_get_counters.approvalTicket from check body (if present) and compute current fingerprint from the current tree summary:
phaseCount, taskCount, edgeCount, injectConvergenceGate.p{phaseCount}-t{taskCount}-e{edgeCount}-cg{flag}.
Evaluate validity (scope, session, ttl, fingerprint, priority).ticketEligible=true.scope|session|ttl|fingerprint|priority|missing) for telemetry.
4b. Load guardrailsContract from check body and validate required keys:directivePrecedenceexecutionModeContractloopBreakerhardBlockersphaseGatePolicyapprovalInheritancepostDecomposeHandofftrackingGranularityevidenceRequirements
Record telemetry:guardrails_contract_source (check_body | safe_default)guardrails_contract_valid (true|false)
Apply priority-aware fail-closed policy (see Policy).execution_plan_build with the check path and desired mode. This computes DAG grouping, risk scores, and strategy selection deterministically.
source: 'planner': use the computed plan directly (manual Steps 6–8 are skipped — the planner owns DAG/strategy).source: 'manual-fallback': present [FALLBACK] badge and use the conservative plan. The fallbackReason field explains why (e.g., PLANNER_DISABLED, API error).This stage is the central mandatory gate for all post-decomposition execution. All decomposition flows (plan-to-spec, tilly, direct) redirect here. No execution without explicit user decisions on all 3 menus.
If ticketEligible=true and user did not request force-confirmation:
AskUserQuestion — reuse ticket executor.context_record_outcome:
approval_prompt_skippedapproval_ticket_reusedMenu 1 — Executor (M1): Present EXECUTION PLAN table with CELLM recommendation per phase, then ask user to select executor via AskUserQuestion.
cellm:implement, cellm:orchestrate, cellm:orchestrate-teams, cellm:swarm (and future executors).cellm:execute does NOT appear as an option — it is the gate, not an executor.Menu 2 — Autonomy Level (M2): Ask user via AskUserQuestion:
throughput, B = balanced (user can further refine to conservative).guardrailsContract.executionModeContract.mode is stricter than the selected mode, prefer stricter mode and report enforcement in telemetry.autonomy_level records the execution mode value (throughput, balanced, or conservative), NOT the menu label (A/B).Menu 3 — Certification (M3, multiple choice): Ask user via AskUserQuestion:
cellm:olympus — Triad certification (Argus/Asclepius/Hefesto)cellm:arena — Quality lab (prove/debug/gate/stress)cellm:convergir — E2E convergence loop (typecheck + tests + oracle)skip — No certification (user accepts risk)cellm:convergir then cellm:olympus — preserve user order in Stage 4).Record telemetry via context_record_outcome:
approval_prompt_count, decomposition_source (see below)recommended_executor, selected_executorautonomy_levelcertification_choice (array of selected options)blocked_reasonapproval_ticket_rejected with reason.decomposition_source: use check.body.decompositionSource when set by the decomposition flow (plan-to-spec, tilly, spec, etc.). If absent, record unknown — do not invent tilly vs plan-to-spec from context guesses.Skill tool (e.g., cellm:implement, cellm:orchestrate-teams).
b. Run go_no_go_evaluate with decisionClass: phase_exit for completed phases.
c. Call go_no_go_record to persist the verdict. Include in inter-stage report via go_no_go_render.
d. If verdict is conditional: run quality_gate, report, ask user.
e. If verdict is no_go: invoke cellm:asclepius via Skill, re-evaluate. Max 2 retries per step — if still no_go, escalate to user.
f. If go_no_go_evaluate fails (error/timeout): BLOCK and ask user — never assume go.
g. Present inter-stage report to user (include rendered go/no-go matrix).
h. Apply confirmation cadence by mode, constrained by guardrailsContract.executionModeContract and interrupt budget:
throughput: continue automatically after report; ask only on hard blocker, band transition with increased risk, or explicit user pause request.balanced: ask once per phase (or at meaningful band transition), not per step, and never exceed per-phase budget.conservative: ask per step (proceed / pause / abort).context_record_outcome with key {checkId}/{phaseId}/step-{n}.
escalation_count and escalation_budget_mode.maxMetaUpdatesWithoutProgress), execute next safe step and record loop_breaker_triggered=true.go_no_go_evaluate with decisionClass: check_exit. Call go_no_go_record to persist.go_no_go_render — include in final report.convergir: invoke cellm:convergir via Skill.arena: invoke cellm:arena via Skill.olympus: invoke cellm:olympus via Skill.skip: skip certification (go_no_go check_exit from step 16 still runs).cellm:convergir + cellm:olympus → run in the sequence the user stated).Read references/risk-model.md for factor tables, empirical adjustment, and confidence band mapping. Key: risk 0-2 = high confidence, 3-5 = medium, 6-10 = low.
Rules are evaluated top-to-bottom. First match wins — no fallthrough.
cellm:implement (never parallelize partial phases)cellm:spec-treat (always — SCE certification required)cellm:implement (sequential, maximum control)cellm:implement (orchestration overhead not worth it)cellm:implementcellm:orchestrate-teams (invoke via Skill)cellm:swarm (invoke via Skill)cellm:orchestratecellm:spec-treatTiebreaker: if multiple rules could match (e.g., rule 5 and 7 both apply), the lower-numbered rule wins. This is enforced by first-match-wins evaluation.
Parallelizable ratio = tasks with no unsatisfied depends_on edges / total pending tasks. Computed from spec_get_tree edges.
Phases partially completed (some tasks done, some pending) are handled by rule 0 — they run individually via cellm:implement.
Rule 0 before Rule 1: A partially completed Convergence Gate phase still matches rule 0 first — finish outstanding tasks with cellm:implement before treating the phase as a clean Convergence Gate for rule 1’s spec-treat posture.
Present mode selection to user alongside the execution plan. User chooses one:
| Mode | Behavior |
|------|----------|
| conservative | Confirm every step. Full quality_gate (typecheck + tests) at every phase_exit. Never batch low-risk confirmations. |
| balanced | Confirm per confidence band. quality_gate at every phase_exit. Batch confirmations for high-confidence steps. Default mode. |
| throughput | Confirm only at band transitions (high->medium, medium->low). quality_gate typecheck at every phase, full tests only at critical phases and check_exit. |
M2 vs Execution Mode: Menu 2 is fail-closed — never substitute a default for a missing M2 answer. After M2 maps (A)→throughput or (B)→balanced, optionally offer a follow-up to refine to conservative; until then, autonomy_level is that mapped value. The "Default mode" row in the table means CELLM recommends balanced as the usual baseline in narrative only — not permission to skip M2.
Operational meaning:
throughput is execution-direct mode with blocker-only escalation.balanced is execution-assisted mode with checkpoint budget.conservative is execution-audit mode with per-step confirmations.Present to user before execution:
EXECUTION PLAN — "{title}" (priority: {p}, {n} phases, {t} tasks)
Mode: {mode} (conservative / balanced / throughput)
| Step | Phase(s) | Risk | Band | Strategy | Reason | Gate |
|------|----------|------|------|----------|--------|------|
| 1 | Phase 1 (2 tasks) | 7 | low | implement | DB migration, risk:7 | phase_exit |
| 2 | Phase 2-4 (18 tasks) | 3 | medium | orchestrate-teams | 3 independent phases | phase_exit |
| 3 | Convergence Gate (1 task) | 3 | low | spec-treat | SCE required | phase_exit |
Post-Check:
| Gate | Condition |
|------|-----------|
| go_no_go check_exit | Always |
| {M3 selections} | User choice from Menu 3 (e.g., convergir + olympus) |
Approve plan? Choose mode (conservative / balanced / throughput) or modify steps.
After each step completes:
STEP {n} COMPLETE — {phase title}
| Metric | Value |
|--------|-------|
| Tasks | {done}/{total} completed |
| Go/No-Go | {verdict} (recorded) |
| Retries | {retry_count}/2 |
| Findings | {count} |
| Band | {confidence_band} |
| Next | Step {n+1}: {phases} via {strategy} |
{go_no_go_render output — decision matrix}
Next action follows mode cadence (auto in `throughput`, phase checkpoint in `balanced`, explicit prompt in `conservative`).
In balanced and throughput modes, batch consecutive high-confidence steps and present a single grouped confirmation instead of per-step.
Read references/go-nogo-contract.md for evaluate/record/render parameter shapes. Key rule: every go_no_go_evaluate must be followed by go_no_go_record. Always include phaseTypeKey in phase_exit calls.
Before emitting STOP on a yellow gate, soft-block, or ambiguous escalation, consult check.body.falseBlockers[] (Facts-first Spec Driver v0.1, spec-655de45c F4).
Procedure:
check.body.falseBlockers from the spec body. If absent or empty, fall through to standard STOP rules.{ signal, why_not_blocker }, match signal against the current blocker description (case-insensitive contains).context_record_outcome with kind: 'false_blocker_override', including signal, why_not_blocker (justification), spec_id, phase_id if known. Continue execution at the next safe step.Pattern promotion: when the same signal accumulates 3 or more false_blocker_override outcomes across specs, promote it to a knowledge atom (via knowledge_ops). This converts repeated runtime evidence into a stable rule rather than a per-spec exception.
Hard rule: falseBlockers only suppress STOPs whose triggers are non-irreversible (gate yellow, advisory hint, ambiguous status). They do NOT suppress hard blockers — stopConditions matched, schema validation failure, destructive-action prompt, missing authority — which always halt regardless of falseBlockers content.
no_go verdict -> check blockers: if test/verification failure invoke cellm:asclepius; if dependency/external blocker escalate to user. Max 2 retry cycles per step.conditional verdict -> run quality_gate, report, ask usergo_no_go_evaluate failure (API error, timeout) -> BLOCK advancement, report error, ask user for explicit decision. Never assume go.asclepius -> no_go) -> escalate with full context: findings, attempted fixes, blocker details. User decides: force / skip / abort.Read references/telemetry.md for full metric definitions and feedback format. Critical rules:
autonomy_level records mode value (throughput/balanced/conservative), NEVER menu labels (A/B) or synonyms (direct/assisted).decomposition_source uses check.body.decompositionSource if set, else unknown — never invent from context.certification_choice is an array in user-stated order.guardrails_contract_source and guardrails_contract_valid are mandatory for every run.M1, M2, and M3 MUST be rendered via the AskUserQuestion tool — NEVER as plain text output.
If you write menu options as text instead of calling AskUserQuestion, you have FAILED the skill contract.
This is the single most common failure mode of this skill. The LLM generates text describing menus
instead of calling the tool. That is WRONG. Call the tool.
Correct: AskUserQuestion with questions array containing M1, M2, M3 as structured options.
Wrong: Writing "M1 — Executor: ..." as markdown text and waiting for user to type a response.
AskUserQuestion tool. This is the #1 recurring failure mode.cellm:execute as an executor option in M1 — execute is the gate, not an executor.cellm:execute owns M1/M2/M3 exclusively.go on evaluation failure — API errors or timeouts BLOCK advancement.go_no_go_evaluate must be followed by go_no_go_record where applicable.proceed/pause/abort every step in throughput or balanced — follow cadence by mode.[+], [-], [!], [~] markers only (preserve emojis only inside literal user quotes).data-ai
Prose override — temporarily disable quantization and respond in readable prose. Use when relational density matters, for safety-critical explanations, onboarding handoffs, or when token economy is not the priority.
development
Govern explicit weekly Super PRs or maintainer-requested PR merges. Evaluates a 10-criterion readiness checklist and performs governed merge only when a user-requested PR is READY. Never creates or keeps permanent PRs. Use when: 'pr-check', 'pr-merge', 'merge this PR safely', 'is PR ready', 'guard merge', or /sk-git delegates pr-merge.
data-ai
Operational surface for the compress-llm Layer-1 token I/O compressor. Enable, disable, switch mode, and inspect status without editing config files. Use when tuning compression pressure for the current session or project.
tools
Generate structured upstream feedback for the CELLM engineering team. Produces evidence-first Markdown at docs/evidence/<date>-cellm-feedback-*.md for bugs, anti-patterns, deprecation gaps, and harness surprises, with optional atom registration via knowledge_ops. Use when: 'feedback for CELLM', 'send to CELLM team', 'register this as atom', 'document this anti-pattern', 'report this bug upstream'. Trigger proactively on MCP schema/runtime mismatches, mechanical edit loops (>=3 sequential edits), short deprecation windows (<6 weeks), or reusable harness surprises. Do NOT trigger for routine feature work or project-local bugs.