skills/review-plan/SKILL.md
Review runbook quality for TDD discipline, step clarity, and LLM failure modes. Detects prescriptive code in GREEN phases, validates RED/GREEN sequencing, checks prerequisite validation, script evaluation, and step clarity. Use when reviewing runbooks after generation by /plan. Use when the user asks to "review runbook", "check runbook for prescriptive code", "validate RED/GREEN discipline", "check for implementation anti-patterns", "check LLM failure modes", or when /plan Phase 1/3 delegates to runbook-corrector agent.
npx skillsauth add ddaanet/agent-core review-planInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Review runbook phase files, fix all issues, and report findings (audit trail + escalation).
Review agent behavior: Write review (audit trail) → Fix ALL issues → Escalate unfixable
What this skill does:
Why review even when fixing: The review report is an audit trail. It documents what was wrong and what was fixed, enabling process improvement and deviation monitoring.
Accept TDD, general, and inline artifacts:
type: tdd in phase metadata or ## Cycle / ### Cycle headers## Step / ### Step headers or no type marker (default: general)(type: inline) tag in phase heading — no step/cycle headers expectedRunbook execution uses three context layers. Review criteria apply to step/cycle content — do NOT flag content present in a higher layer as missing.
plugin/agents/artisan.md or test-driver.md) — tool usage, execution protocol, error handling. Combined with all steps via prepare-runbook.py.## Common Context in runbook) — project paths, constraints, cross-step dependencies. Available to all steps.False positive prevention: Before flagging "missing tool reminders", "missing project paths", or "missing error escalation" in steps, verify the content isn't in baseline or Common Context. These are the three most common false positives.
Before applying review criteria, load project-specific quality patterns:
plans/foo/runbook-phase-1.md or plans/foo/runbook.md, the plan directory is plans/foo/. If no plans/ prefix, skip recall.Bash: edify _recall resolve plans/<job>/recall-artifact.md — if _recall resolve succeeds, its output contains resolved decision content (failure modes, quality anti-patterns). Caller-provided entries take precedence; skip if already provided in the delegation prompt.memory-index.md (skip if already in context), identify review-relevant entries (quality patterns, failure modes, testing conventions), batch-resolve via edify _recall resolve "when <trigger>" .... Proceed with whatever recall yields.Recall supplements, does not replace, the review criteria below.
Violation: GREEN phase contains implementation code — prescribes exact code, agent becomes copier. See references/review-examples.md Section 1 for violation/correct examples.
Check: Will test actually fail in RED phase before GREEN implementation?
Common violations:
Correct pattern:
Hints for sequencing are acceptable; prescriptive code blocks are violations. See references/review-examples.md Section 3 for examples.
Violation: **Verify GREEN:** line contains specific pytest paths (e.g., pytest tests/test_foo.py::TestBar::test_baz -v). These accumulate staleness — test class renames, file moves, and refactors invalidate them. Executors derive correct paths from context.
Correct pattern: **Verify GREEN:** just green (universal recipe). The just green recipe handles format, lint, and test in one command.
Check: Scan all **Verify GREEN:** and **Verify RED:** lines. Flag any containing specific file paths (tests/...) or test function selectors (::). just green, just lint, and just test are acceptable.
Must have: Specific test name, expected failure message, file location, why it will fail. See references/review-examples.md Section 4 for good example.
Violation: RED test prose only verifies structure, not behavior.
Check: For each RED phase, ask: "Could an executor write different tests that all satisfy this description?" If yes → VIOLATION: prose too vague.
Read references/review-examples.md Section 5 for indicator lists and correct prose patterns.
Violation: RED phase uses full test code instead of prose description.
Check: Scan RED phases for python code blocks containing def test_*(): or multiple assert statements.
Read references/review-examples.md Section 5.5 for acceptable vs unacceptable patterns.
Check: Total Steps in Weak Orchestrator Metadata matches actual cycle/step count
## Cycle X.Y: / ### Cycle X.Y: headers (TDD)## Step N.M: / ### Step N.M: headers (general)Check: Restart-reason verification
.claude/agents/), hook configuration, plugin changes, MCP server configuration/when recall@-referenced files have content loaded at startup (restart needed); indexed-but-recalled files load on-demand (no restart)Warning: First cycle in a phase tests empty/degenerate case
Check: Does Cycle X.1 test empty input, no-op, or missing data?
Check: Merged cycles/steps maintain isolation
Indicators of bad consolidation:
Check: Trivial work placement
Good patterns:
Bad patterns:
Violation: Runbook references file paths that don't exist in the codebase
Check: Extract all file paths from:
For each path: Use Glob to verify the file exists. If not found, try fuzzy match.
Classification:
10.1 Prerequisite Validation
**Prerequisite:** Read [file:lines] — understand [behavior/flow]10.2 Script Evaluation
10.3 Step Clarity
10.4 Conformance Validation
Detection: Scan phase headings for (type: inline) tag.
Apply these criteria:
10.5.1 Vacuity (instruction specificity)
10.5.2 Density (verifiable outcome)
See references/review-examples.md Section 10.5 for good/bad examples of both.
10.5.3 Dependency ordering
Skip for inline phases: GREEN/RED validation, prescriptive code detection, prerequisite validation, script evaluation, step clarity checks. These criteria apply only to TDD/general phases.
Relationship to Section 11 (LLM Failure Modes): For inline phases, Section 10.5 criteria supersede the type-specific (TDD/General) sub-bullets in Section 11. The type-neutral rules in Section 11 (foundation-first ordering, checkpoint spacing, file growth) still apply.
Five structural axes that cause execution failures. Apply regardless of phase type.
11.1 Vacuity
assert callable(X) or import XImportError or AttributeError instead of behavioral AssertionError — strong assertions never execute because import fails first. Fix: add Bootstrap step creating stub, expect behavioral failure**Bootstrap:** Not needed, **Bootstrap:** None, or similar). When Bootstrap is not needed, the section must be omitted entirely — absence statements add noise without gating value. Fix: remove the line11.2 Dependency Ordering
11.3 Density
11.4 Checkpoint Spacing
11.5 File Growth
Check: Model tag matches task complexity and artifact type.
Artifact-type override violations (advisory):
plugin/skills/), fragments (plugin/fragments/),
agents (plugin/agents/), or workflow decisions (agents/decisions/workflow-*.md)
assigned below opus → flagFile: references in Changes section against override pathsComplexity-model mismatch (advisory):
Advisory only. Model assignment involves judgment — findings inform but don't block. Do NOT mark as UNFIXABLE or CRITICAL. Report as Minor with suggested correction.
1a. Determine phase type(s):
## Cycle / ### Cycle headers → TDD## Step / ### Step headers → general(type: inline) in phase headings → inline1b. Check GREEN phases for implementation code (TDD):
Use Grep tool to find code blocks in GREEN phases:
Pattern: ^\*\*GREEN Phase:\*\*
Context: Use -A 20 to see 20 lines after each match
Secondary check: Look for ```python markers in context
For each code block found:
1c. Check RED phases for full test code (TDD):
Pattern: ^\*\*RED Phase:\*\*
Context: Use -A 30 to see test content
Check for: def test_.*\(\): pattern inside code blocks
1d. Check step quality (general):
Extract all file paths referenced in the runbook. For each path:
For TDD cycles:
[REGRESSION] cycles, verify assertions pass. Arithmetic verification required (e.g., compute output lengths against thresholds).For general steps:
For all items:
Fix-all policy: Apply ALL fixes (critical, major, AND minor) directly to the runbook/phase file.
Fix process:
Common fixes:
Fix constraints:
Structure:
# Runbook Review: {name}
**Artifact**: [path]
**Date**: [ISO timestamp]
**Mode**: review + fix-all
**Phase types**: [TDD | General | Inline | Mixed (N TDD, M general, K inline)]
## Summary
- Total items: N (cycles: X, steps: Y)
- Issues found: N critical, N major, N minor
- Issues fixed: N
- Unfixable (escalation required): N
- Overall assessment: Ready | Needs Escalation
## Critical Issues
### Issue 1: [description]
**Location**: [cycle/step, line range]
**Problem**: [what's wrong]
**Fix**: [what was done]
**Status**: FIXED | UNFIXABLE (reason)
## Major Issues
[same format]
## Minor Issues
[same format]
## Fixes Applied
- [list of all fixes applied]
## Unfixable Issues (Escalation Required)
[numbered list with rationale, or "None — all issues fixed"]
Report file: plans/<feature-name>/reports/runbook-review.md (or phase-N-review.md for phase files)
Return filepath only (or with escalation note). Read references/report-template.md for full report structure and return format.
Automatic: /plan Phase 1 (per-phase) and Phase 3 (final) delegate to runbook-corrector agent Manual: Delegate to runbook-corrector agent with runbook/phase file path
Workflow:
/design → /plan → runbook-corrector agent (fix-all) → [escalate if needed] → prepare-runbook.py → /orchestrate
Automatic review: /plan Phase 1 triggers per-phase review, Phase 3 triggers final holistic review
Escalation path: If ESCALATION noted in return, caller must address unfixable issues before proceeding
development
Verify a Python function against its intended behavior by writing an icontract contract and checking it with `edify check` (CrossHair), repairing in a loop. Triggers on "formalize", "verify this function", "add a contract and check it", or after writing a function whose correctness matters. The in-context agent holds intent and asks the user when behavior is ambiguous; CrossHair owns the deduction.
tools
Manage git worktrees for parallel task execution. Triggers on "create a worktree", "set up parallel work", "merge a worktree", "branch off a task", or uses the `wt`, `wt merge`, or `wt-rm` shortcuts. Worktree lifecycle: creation, focused sessions, merge ceremony, cleanup, parallel task setup.
testing
Recall behavioral knowledge from project decisions. Triggers on "when to do X", situational patterns, or decision content for recognized situations. Invoke with "/when <trigger>".
tools
Sync edify fragments and portable justfile to match the current plugin version. Detects user-edited files and warns instead of overwriting. Use --force to overwrite conflicts.