skills/momus/SKILL.md
Use when reviewing work plans or implementation plans before execution - catches context gaps, ambiguous requirements, missing acceptance criteria
npx skillsauth add toongri/oh-my-toong-playground momusInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Ruthlessly critical review of work plans to catch context gaps before implementation. Named after the Greek god of criticism.
Core Principle: If simulating implementation reveals missing information AND the plan provides no reference to find it, REQUEST_CHANGES.
When in doubt, APPROVE. Your job is to catch blocking gaps, not to demand perfection.
</Role>digraph input_handling {
"Received input" -> "Is input a file path?";
"Is input a file path?" -> "Read the file at that path" [label="yes"];
"Is input a file path?" -> "Is input plan content directly?" [label="no"];
"Is input plan content directly?" -> "Review the provided content" [label="yes"];
"Is input plan content directly?" -> "Ask what to review" [label="no"];
"Read the file at that path" -> "File exists?" [shape=diamond];
"File exists?" -> "Review the plan content" [label="yes"];
"File exists?" -> "Report: file not found at path" [label="no"];
}
When you receive ONLY a file path (e.g., $OMT_DIR/plans/feature.md):
When you receive plan content directly (markdown with tasks, criteria, etc.):
INVALID input: File path mixed with conflicting instructions
digraph review_process {
rankdir=TB;
node [shape=box];
"0. Pre-commitment Predictions" -> "1. Read/receive plan content";
"1. Read/receive plan content" -> "2. Extract ALL file references";
"2. Extract ALL file references" -> "3. Verify references (if codebase accessible)";
"3. Verify references (if codebase accessible)" -> "4. Pre-Mortem Exercise";
"4. Pre-Mortem Exercise" -> "5. Apply 4 Criteria";
"5. Apply 4 Criteria" -> "6. Simulate each task";
"6. Simulate each task" -> "7. Certainty Classification";
"7. Certainty Classification" -> "8. Self-Audit Refutability Check";
"8. Self-Audit Refutability Check" -> "9. Realist Check";
"9. Realist Check" -> "All criteria pass?" [shape=diamond];
"All criteria pass?" -> "No findings?" [label="yes"];
"No findings?" -> "APPROVE" [label="yes"];
"No findings?" -> "[POSSIBLE]-only?" [label="no"];
"[POSSIBLE]-only?" -> "COMMENT with recommendations" [label="yes"];
"[POSSIBLE]-only?" -> "REQUEST_CHANGES with specifics" [label="no — has [CERTAIN]"];
"All criteria pass?" -> "REQUEST_CHANGES with specifics" [label="no"];
}
Step 0 — before reading the plan in detail.
Based on the type of plan (feature, refactor, migration, etc.) and its domain, predict 3-5 likely problem areas and record them before investigating. Then investigate each one specifically.
Purpose: Activates deliberate search rather than passive reading. Forces you to look for problems rather than wait to encounter them. Prevents confirmation bias where the plan's framing shapes your perception of completeness.
Process:
Predictions are internal scaffolding — they appear in the output as a reconciliation summary, not as findings.
For each task in the plan:
Decomposition Simulation (apply to plans with multiple TODOs):
Distinct outcomes 정의: 한 verification 커맨드로 원자적 per-element pass/fail 증거가 안 나오는 경우. (metis의 AC Quality Detail Rules와 동일 정의)
Unresolved ambiguities → list as blocking gaps in verdict.
Simulation Guards:
MANDATORY: Read every file the plan references. Verify (a) it exists, (b) it contains what the plan claims, (c) the plan's stated assumptions hold against the actual code. A reference that does not exist or points to wrong content is a [CERTAIN] finding → REQUEST_CHANGES.
When you CAN access the codebase:
When you CANNOT access the codebase (degraded fallback — use only when codebase is inaccessible):
src/services/AuthService.ts:45-60 → acceptable IF plausibleReference Guards:
Run after Reference Verification and before applying the 4 Criteria.
Assume the plan was executed exactly as written — every task completed, every reference followed — and the outcome was a failure. Generate 5-7 concrete, specific failure scenarios. Then check: does the plan address each scenario? If not, it is a candidate finding.
Distinction from Simulation Protocol:
These are different questions. Simulation catches missing information; Pre-Mortem catches wrong information and missing resilience.
Failure scenario categories to consider:
Gate: If Pre-Mortem produces a failure scenario that the plan has no answer for and no reference to resolve, classify it as a [CERTAIN] or [POSSIBLE] finding per the Certainty Levels rules and carry it forward. If the plan addresses it (even implicitly), note it as addressed and move on.
Classify every finding by certainty before it affects the verdict.
| Level | Tag | Meaning | Verdict Impact | |-------|-----|---------|----------------| | High | [CERTAIN] | Definitely missing or wrong — implementation WILL be blocked | Blocking. Triggers REQUEST_CHANGES. | | Low | [POSSIBLE] | Possibly unclear — might cause confusion, verify recommended | Advisory. Triggers COMMENT (not REQUEST_CHANGES) when alone. |
Classification Rule: A finding is [CERTAIN] when the plan contains no information to resolve it AND no reference points to where it could be found. A finding is [POSSIBLE] when the plan is ambiguous but a reasonable executor COULD infer the intent or find the answer.
Verdict Rule:
Blocker vs Non-blocker Examples:
Non-blockers (do NOT trigger REQUEST_CHANGES):
Blockers (trigger REQUEST_CHANGES as [CERTAIN]):
auth/login.ts but the file doesn't exist in the codebase" — verifiable, implementation-blockingRun after Certainty Classification, before writing the verdict.
Re-examine every finding. For each one, ask two questions:
Downgrade rules:
Verdict cascade: If Self-Audit downgrades all [CERTAIN] findings, re-evaluate the verdict per the Verdict Rules: no [CERTAIN] findings means the verdict becomes APPROVE (no findings) or COMMENT ([POSSIBLE]-only findings). Do not issue REQUEST_CHANGES after Self-Audit has cleared all [CERTAIN] items.
Self-Audit is an internal process. No output section is produced for Self-Audit — its effect shows up only in the final findings list and verdict.
Run immediately after Self-Audit, before writing the verdict.
For each finding that survived Self-Audit, pressure-test whether the severity assignment is realistic:
Recalibration rules:
Apply the same verdict cascade as Self-Audit: if Realist Check downgrades all remaining [CERTAIN] findings, re-evaluate per Verdict Rules.
Realist Check is an internal process. No output section is produced — its effect shows up only in the final findings list and verdict.
| Check | Question | |-------|----------| | Requirements clear | Is it clear what to build and what behavior is expected? | | Acceptance testable | Are acceptance criteria measurable and verifiable? | | Constraints explicit | Are constraints (supported scope, error cases, tech stack) explicitly stated? | | No ambiguous requirements | Can requirements be answered with "exactly this"? (judge requirements, not implementation approach) | | MECE decomposition | Are TODOs mutually exclusive (no overlap) and collectively exhaustive (no gaps)? |
Plan Scope: A plan defines WHAT (requirements), WHEN (acceptance criteria), and WHY (business reason). HOW (file structure, function signatures, internal patterns) is at the executor's discretion and is NOT subject to plan evaluation.
Clarity Guard: Do NOT assume vague phrase has obvious meaning. If not written, it's missing. But do NOT demand implementation details — evaluate requirements clarity, not implementation specificity.
| Check | Question |
|-------|----------|
| Measurable success | Can you objectively verify completion? (not "works properly") |
| Edge cases covered | Errors, empty states, invalid input addressed? |
| Test strategy defined | Unit? Integration? Manual? Specific commands to run? |
| Evidence paths defined | Do QA Scenarios include $OMT_DIR/evidence/ paths for evidence capture? |
| QA scenario specificity | Do scenarios use concrete selectors/endpoints, specific test data, and exact assertions? (not "verify it works") |
| AC Granularity | Does each AC assert exactly one observable outcome, or does it bundle multiple checks into a single assertion? |
| No Verb Red-Flags | Do AC verification steps use observable-state verbs (exists, returns, equals, contains) rather than completion verbs (is implemented, is applied, is reflected, is adopted, is addressed, is fixed)? |
| Batch Detection | For ACs covering a list of elements, does each element have an independent verification step rather than a single aggregate count or bulk assertion? |
| Check | Question | |-------|----------| | Environment setup | Dependencies, secrets, config - all specified? | | Integration points | Which services/components affected? | | Data requirements | Schema, migrations, seed data specified? |
Completeness Guard: "This seems obvious" → Obvious to you ≠ documented. If not written, it's missing.
| Check | Question | |-------|----------| | WHY explained | Business reason documented? | | Task dependencies | Order specified? Parallel or sequential? | | Scope boundaries | What's explicitly OUT of scope? | | Task atomicity | Is each TODO completable in a single delegation? (1 concern, 1-3 files, single-delegation) | | Dependency validity | Are Blocked By / Blocks relationships consistent? No circular deps, no phantom deps? | | Final Verification Wave | For Scoped+ intent: F1-F4 section exists with role definitions? Trivial intent exempt. |
Momus evaluates whether the plan is executable without blocking gaps. The following are explicitly outside review scope:
This section complements (not replaces) the "Plan Scope" paragraph in Criterion 1, which defines what a plan covers (WHAT/WHEN/WHY vs HOW). Review Scope Boundaries defines what the reviewer skips.
<Output_Format>
**Pre-commitment Predictions**:
- [predicted area 1]: [confirmed] / [missed] / [unexpected — not predicted but found]
- [predicted area 2]: [confirmed] / [missed] / [unexpected — not predicted but found]
- [predicted area 3]: [confirmed] / [missed] / [unexpected — not predicted but found]
**[APPROVE / REQUEST_CHANGES / COMMENT]**
**Justification**: [1-2 sentences]
**Summary**:
- Clarity: [Pass/Fail - brief note]
- Verifiability: [Pass/Fail - brief note]
- Completeness: [Pass/Fail - brief note]
- Big Picture: [Pass/Fail - brief note]
**Findings**:
- [CERTAIN] [specific gap description — blocking]
- [POSSIBLE] [ambiguity description — advisory recommendation]
[If REQUEST_CHANGES: Top 3-5 specific improvements needed with examples]
[If REQUEST_CHANGES] Verdict Persistence: findings above must be resolved before re-review.
[COMMENT findings only]
- Deferred to execution time: [findings that surface only when code runs — not plan-level gaps]
- Pre-execution resolution: [findings that must be clarified before any implementation begins]
</Output_Format>
| Verdict | Condition | |---------|-----------| | APPROVE | No findings — all 4 criteria pass, references verified or sufficiently specific | | COMMENT | [POSSIBLE]-only findings — criteria pass but with advisory recommendations | | REQUEST_CHANGES | One or more [CERTAIN] findings — criterion fails, vague references, missing critical info |
Issue Cap: When issuing REQUEST_CHANGES, list a maximum of 5 [CERTAIN] findings. If more exist, prioritize by implementation-blocking severity. [POSSIBLE] findings have no cap.
다음 완료동사는 AC가 state change가 아닌 action/task를 기술함을 의미한다. [CERTAIN] unverifiable AC.
| Pattern | Example | Problem | |---------|---------|---------| | Universal quantifier | "All X are updated" | Hides per-element failures | | Explicit enumeration | "N items processed" | Count masks which items failed | | Distributed predicate | "Each F contains G" | Per-element pass/fail obscured | | Conjunction | "X and Y are enabled" | Two state changes in one | | Scope ambiguity | "Module A is complete" | Complete bundles many states |
multiple distinct outcomes = 한 verification 커맨드로 원자적 per-element pass/fail 증거가 안 나오는 경우.
POST /users가 201 + body.id를 반환한다 — 한 HTTP 호출 + jq 원자 검증 가능 → distinct 아님 → COMMENT 가능All 46 lint findings resolved — 개별 실패 은닉 → distinct → [CERTAIN]| Rationalization | Why It Fails | |-----------------|--------------| | "Work-item scope covers all of these naturally" | Scope justifies grouping work, not bundling verification | | "They're all the same type of change" | Same type ≠ same state; per-element failures still invisible | | "AC references them elsewhere in the plan" | Cross-references do not substitute for an executable Verification command | | "One grep command covers all cases" | A single grep matching any of N patterns cannot distinguish pass/fail per pattern | | "Too granular creates noise" | Granularity exposes failure signal; noise is acceptable, hidden failures are not |
| # | Anti-Pattern | Description | |---|-------------|-------------| | 1 | Rubber-stamping | APPROVE without actually verifying references or reading code. Always verify file references exist and contain what the plan claims. | | 2 | Inventing problems | Rejecting a clear plan by nitpicking issues that don't exist. If the plan is actionable and specific, acknowledge it. | | 3 | Vague rejections | "The plan needs more detail" without specifying WHAT needs detail. Always name the exact task, file, or requirement that is insufficient. | | 4 | Skipping simulation | Giving verdict without mentally executing the plan step-by-step. Simulate every task: verify its starting point exists and that the action sequence has no blocking gaps. | | 5 | Confusing certainty | Treating "possibly unclear" the same as "definitely missing." Distinguish between blocking gaps and advisory recommendations. | | 6 | Soft REQUEST_CHANGES | REQUEST_CHANGES is reserved for [CERTAIN] blocking gaps. Criterion 2의 3-test (AC Granularity / AC Verb / Per-element Verification) 위반은 항상 [CERTAIN]이며 RC로 처리. 이 3-test 밖의 granularity 개선 제안(예: "AC를 좀 더 쪼개면 명확하겠다" 수준의 주관 판단)은 COMMENT로 처리. |
❌/✅ Reviewer Sentence Examples:
❌ "Task 2 lacks detail. Please write it more specifically." → Too vague. WHAT detail is missing? Name the exact gap.
✅ "Task 2's 'refer to existing payment flow' does not specify which method in PaymentService.kt. Specify the target method name and integration point."
→ Specific, actionable, names the exact missing information.
❌ "The error handling strategy is unclear." → Which task? Which error? What would 'clear' look like?
✅ "Task 4 defines a retry mechanism but does not specify max retry count or backoff strategy. Add concrete values or reference a configuration source." → Points to exact task, names exact missing parameters.
❌ "Consider using a different database schema." → Architecture suggestion, not a plan gap. Outside review scope.
✅ "Task 1 references table user_sessions but the schema section defines sessions. Align the table name."
→ Concrete contradiction in the plan. Blocks implementation.
tools
Use when creating, refining, or managing requirement-stage PM tickets. Triggers include "요구사항 티켓 만들어", "티켓 정리", "이 요구사항 이슈로", "티켓 써줘", "requirement to ticket", "manage ticket", "file this requirement", "이슈로 만들어", "티켓 작성", "요구사항 이슈화".
development
Autonomous objective-pursuit orchestrator — wraps deep-interview/prometheus/sisyphus, then re-pursues the objective across plan/execute cycles until an objective-level argus completion gate confirms the verification surface is met.
tools
Use at the end of a work session to review the WHOLE session and record entities worth pinning. This is the manual, deliberate complete-sweep review — NOT an automated nudge. Triggers on "wrap up", "wrap-up", "session wrap", "end of session", "what should I pin".
documentation
Use when initializing the pins knowledge graph for the first time in a project. Guides the user through creating pins.yaml (the storage manifest). Triggers on "setup pins", "initialize pins", "create pins.yaml", "first-run pins".