src/doctrine/skills/spec-kitty-mission-review/SKILL.md
Review a fully merged Spec Kitty mission post-merge (all WPs done/approved) to verify spec→code fidelity, FR coverage, drift, risks, and security. Triggers: "review the merged mission", "post-merge mission review", "verify the completed mission", "audit the mission implementation", "mission-level acceptance review", "is this mission releasable", "final review before tagging", "cross-WP coverage audit". Does NOT handle: per-WP review during implementation (use spec-kitty-runtime-review), implement-review loop orchestration (use spec-kitty-implement-review), setup or repair (use spec-kitty-setup-doctor), or glossary maintenance (use spec-kitty-glossary-context).
npx skillsauth add priivacy-ai/spec-kitty spec-kitty-mission-reviewInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are the expert senior reviewer for a completed Spec Kitty mission. The mission has been fully implemented, all WPs have been approved, and the feature branch has been merged. Your job is to answer a single question with documented evidence: does the merged code accurately and completely realize the spec, and are there risks the implementation team did not surface?
This is not a checklist exercise. It is structured adversarial analysis. You read the spec as the author's promise and the code as the executor's delivery, and you measure the gap. Every finding you produce must be traceable to an artifact (a spec section, a git diff line, a test file, a contract clause). You do not fix anything. You document.
spec-kitty merge --mission <slug> completes and all WPs show doneThis is not the pre-merge acceptance gate. Run spec-kitty accept --mission <slug> before merge; use this skill after merge for final spec-to-code review.
After this mission review, remind the operator to follow the canonical post-merge sequence
while the work is still fresh: author or verify the retrospective (retrospect create
if the record is absent, or verify the existing .kittify/missions/<mission_id>/retrospective.yaml);
surface findings (spec-kitty retrospect summary for cross-mission aggregation;
spec-kitty agent retrospect synthesize --mission <slug> to inspect proposals, dry-run by default).
Before reading a single line of code, anchor yourself to what the mission promised and where it stands.
# Confirm the mission is fully merged (all WPs must be done)
spec-kitty agent tasks status --mission <slug>
If any WP is not in done, this is not a post-merge mission review — use
spec-kitty-runtime-review instead.
# Read the mission identity
cat kitty-specs/<slug>/meta.json
Note the baseline_merge_commit (the SHA of the PR that preceded this mission,
if present) and mission_type. These anchor every git diff you will run.
# Scan the event log for the full state machine history
cat kitty-specs/<slug>/status.events.jsonl
The event log tells you: how many rejection cycles each WP had, which WPs were
forced (unusual transitions that bypassed normal flow), whether any WPs were
approved by arbiter override rather than clean review, and whether any WP has a
ReviewerSelfApproval event. A WP with 3+ rejection cycles that ended in an
arbiter-forced approval is a high-priority target for your analysis — the
disagreement history is a signal. A WP with ReviewerSelfApproval is also
high-priority: flag it as an independence/process risk and verify whether an
independent reviewer later re-reviewed it.
You cannot review what you do not understand. Read the full specification and all contract artifacts before looking at code.
# The specification: goals, non-goals, locked decisions, FRs, NFRs, constraints
cat kitty-specs/<slug>/spec.md
# The technical design per track
cat kitty-specs/<slug>/plan.md
# The WP breakdown: subtasks, FR references, DoD per WP, FR coverage table
cat kitty-specs/<slug>/tasks.md
# Acceptance test scenarios (canonical test contract layer)
cat kitty-specs/<slug>/contracts/test-contracts.md
# CLI behavior contracts (what commands must and must not do)
cat kitty-specs/<slug>/contracts/cli-contracts.md
# File format contracts (schema expectations, migration story)
cat kitty-specs/<slug>/contracts/file-format-contracts.md
As you read, build a mental model of:
metadata.yaml
matches pyproject.toml — but if the spec never stated that as an FR, no
test will catch it. These are your highest-value findings.Establish a clean baseline-to-HEAD picture. Every code change since baseline is the implementation's evidence. Anything in the spec that produced no diff is suspect.
# What changed since the baseline commit?
# Use baseline_merge_commit from meta.json
git log <baseline_merge_commit>..HEAD --oneline
# Summary of changed files — use this to build your review coverage map
git diff <baseline_merge_commit>..HEAD --stat
# Full diff (for large missions, scope by directory first)
git diff <baseline_merge_commit>..HEAD -- src/
git diff <baseline_merge_commit>..HEAD -- tests/
git diff <baseline_merge_commit>..HEAD -- docs/
Build a coverage map from the diff stat: which files changed, and which spec tracks they correspond to. Then invert the map: are there files that a spec track required to change that do NOT appear in the diff? Missing changes are often more important than the changes that exist.
For each WP, cross-reference owned files from its frontmatter:
# Read each WP's owned_files declaration
cat kitty-specs/<slug>/tasks/WP01-*.md | head -40
Then verify the claimed owned files actually appear in the diff:
git diff <baseline_merge_commit>..HEAD -- <owned_file_path>
A WP that declares ownership of a file that shows no diff is either incomplete or its work was done in a different file than declared. Determine which.
The WP review cycle files are a goldmine of documented problems. Every issue a per-WP reviewer raised is a partial constraint on what was shipped. Your job is to verify resolutions, not re-do the work.
# For each WP that had review cycles, read all cycle files
ls kitty-specs/<slug>/tasks/WP*/
cat kitty-specs/<slug>/tasks/WP01-*/review-cycle-*.md
cat kitty-specs/<slug>/tasks/WP02-*/review-cycle-*.md
# ... repeat for each WP
For each rejection cycle, identify:
Issues that were flagged but then "resolved" via a forced approval or an arbiter override warrant special scrutiny. Read the git commit that fixed each blocker:
# Find the commit message that addressed a specific review issue
git log --oneline --all | grep -i "WP0X"
git show <commit_sha> -- <file_flagged_in_review>
Deferred issues are a category of their own. A review cycle may have closed with "deferred to follow-up issue #NNN" — if that deferral is not documented anywhere post-merge (no GitHub issue, no CHANGELOG note), it is a silent hole in the delivery.
This is the core of the review. For every Functional Requirement, you need a closed chain: spec stated it → WP owned it → test verifies it → code implements it. A chain with any broken link is a finding.
Start from the FR coverage table in tasks.md:
| WP | Spec FRs |
|-------|---------|
| WP01 | FR-001, FR-002, FR-003, ... |
For each FR:
# 1. Find the test(s) for this FR
grep -r "FR-NNN\|T<track>.<n>" tests/ --include="*.py" -l
# 2. Read the test — does it actually test what the FR requires?
# (Not just name it in a comment)
grep -n "FR-NNN" tests/<relevant_test_file>.py -A 10
# 3. Find the code that satisfies the FR
git diff <baseline_merge_commit>..HEAD -- src/ | grep -A 10 -B 5 "<key_symbol>"
The question for each test is not "does the test pass?" (you can assume it does — the mission merged). The question is: does the test actually constrain the behavior the FR requires?
A test that creates a synthetic fixture with status: done in frontmatter
when real status is stored in status.events.jsonl does not test what the FR
requires. It tests an imagined model that does not exist in production. This is
the "passing test, failing system" failure mode.
For every FR that maps to a test, ask: if someone deleted the implementation code this test is supposed to cover, would the test fail? If the answer is "no, the test would still pass because it uses a synthetic fixture," the test is a false positive and the FR is effectively untested in production.
After the FR trace, step back and look at the mission as a whole. Drift is when the implementation deviates from what the spec locked. Gaps are things the spec required that no WP delivered.
Re-read every Non-Goal (NG-N). Then search the diff for anything that touches the territory those non-goals define:
# Example: if NG-2 says "no backfill of historical missions"
# Search for any code that iterates existing kitty-specs/
git diff <baseline_merge_commit>..HEAD -- src/ | grep -n "kitty-specs\|missions_dir\|glob\|walk"
Code that touches out-of-scope territory is either a scope violation or needs clear justification in the review record. Absent documentation, assume violation.
For each locked Decision (D-N), identify the key invariant it establishes. Then grep for code patterns that contradict it:
# Example: D-1 says "init MUST NOT initialize git, under any flag combination"
# Verify no git init call paths remain
git diff <baseline_merge_commit>..HEAD -- src/ | grep -n "git init\|init_git_repo\|subprocess.*git"
Locked decisions often have both positive requirements (the new behavior) and negative requirements (the forbidden old behavior). Reviewers frequently verify the positive side and forget the negative side. Verify both.
Cross-reference the FR list with the test files. An FR that appears in
tasks.md's requirements coverage table but produces no grep hit in tests/
is a punted FR:
# For each FR-NNN in spec.md, check test coverage
grep -r "FR-NNN" tests/ --include="*.py" -l
A punted FR that was marked "Proposed" in the spec and never upgraded to "Approved" in plan review is documentation drift only. A punted FR that was marked "Approved" (accepted into the delivery contract) with no test coverage is a delivery gap.
For each Non-Functional Requirement (NFR) with a measurable threshold:
# Example: NFR-004 — new tests < 60s in aggregate
# Count new test files and estimate
git diff <baseline_merge_commit>..HEAD --stat | grep "tests/"
# Example: NFR-006 — project type-checking clean
# Check if any type-check suppression directives were added
git diff <baseline_merge_commit>..HEAD -- src/ | grep -n "type: ignore\|noqa"
Pay particular attention to performance NFRs — they rarely have automated enforcement and are easy to skip.
These are the patterns that per-WP review tends to miss and that mission review reliably catches. When you are looking for drift, look here first:
1. Tests pass against synthetic fixtures that do not exist in production.
A WP claims to test FR-NNN but the test constructs a dict with the expected
shape instead of invoking the production code path that would produce that
shape. The test passes because the fixture is what the assertion expects;
production would never generate that shape. Ask: if I delete the
implementation code, does this test still pass? If yes, the FR is untested.
Past cases: a "decision resolution" test that built {"status": "resolved"}
as a literal instead of calling the resolver; a "widen response" test that
asserted result["slack_thread_status"] without verifying anything ever set
that field in the atomic block.
2. New module has no live caller from a production entry point.
A WP creates src/<feature>/<new_module>.py with full coverage tests, but
grep -r "from <feature>.<new_module> import" src/ --include="*.py" (excluding
the tests themselves) returns zero hits. The module is dead code, regardless
of test quality. Past cases: run_specify_interview and run_plan_interview
functions added by a WP that were never wired into the actual specify /
plan slash-command entry points. Fix pattern: the mission review
surfaces the dead code; a post-merge follow-up PR wires it into the real
entry point.
3. FR is listed in a WP's requirement_refs frontmatter but no live
test asserts that FR's behavior. The WP review sees the ref and assumes
coverage; the FR never appears in the assertion body of any test. Grep
each FR-NNN from the coverage matrix against tests/. Zero hits (excluding
comments) is a delivery gap, not a documentation gap. Past cases: a
mission shipped with FR-025 mapped to a WP whose tests asserted only that
/widen returned 200, never that slack_thread_status == "pending" appeared
in the response body.
4. API-level input whitelist rejects valid new event types before the
reducer sees them. The reducer is wired correctly; the gate upstream of it
has a hard-coded allow-list that was not updated. All the reducer tests pass
(they invoke the reducer directly, bypassing the gate). Production POSTs to
the endpoint return 400 on the new event type. Past case: a projection
mission added DecisionPointOpened / DecisionPointResolved event types
but the /events/ endpoint's validate_event_schema() allow-list was not
extended — 100% of production events would have been rejected despite 100%
of unit tests passing.
5. TOCTOU between an ORM create and an external-API side effect.
The task calls an external API, then writes a DB row to record what happened.
If the DB write fails for any reason (transient DB error, Celery worker
killed mid-task, retry boundary), the next retry calls the external API
a second time with no record of the first call. Detection: walk every
Celery task that calls an external API followed by Model.objects.create()
and ask "what happens if the task dies between these two lines?" Past case:
create_widen_slack_thread posted the Slack opener before persisting the
SlackThread row; a transient DB error would have produced a duplicate
opener in the Slack channel on retry. Fix pattern: persist a placeholder
row BEFORE the external call, update it after.
6. Silent empty-result return on hidden error. A function wraps its
external dependency in try: ... except Exception: return "" (or None,
[], {}). Tests that mock the happy path pass; production encounters a
transient error and the function returns an empty value that the caller
treats as valid. Past case: a "fetch permalink" helper returned empty
string on any Slack API error, which downstream code passed to a Slack
message template as the permalink field — users got opener messages with
an empty permalink and no error surfaced. Detection: grep for
except Exception: followed by return "", return None, return [], or
pass inside any module changed by the mission.
7. Locked Decision violated in a new code path. The spec's Decisions
section says MUST NOT do X under any flag combination. A WP adds a new
flag-off branch that does X because "the flag is off so the invariant
doesn't apply" — but Locked Decisions are absolute, not flag-gated.
Detection: for each MUST NOT clause, grep the diff for code patterns
the clause forbids. Past case: a migration bug where a flag-off branch
re-ran git init despite a locked decision that init must never re-init.
8. Ownership drift at file boundaries shared across WPs. When WP07 and
WP08 each add a class to apps/<feature>/serializers/__init__.py, or when
WP04 and WP06 each add routes to urls.py, the add/add conflict is
resolved in the merge step but the logical ownership is ambiguous. Neither
WP's reviewer sees the other's additions. Detection: git log --oneline
on each shared file, look for multiple WPs in the same merge. The fix is
usually fine; the risk is hidden ownership ambiguity for future changes.
This step requires active adversarial imagination. You are looking for things that could go wrong that neither the spec nor the WP reviews anticipated.
For every "MUST NOT happen under any flag combination" clause in the spec, the implementation will have conditional logic to enforce it. Find that logic and ask: is there a code path that bypasses the condition?
# Find all conditional branches near a key enforced behavior
git diff <baseline_merge_commit>..HEAD -- src/<key_file>.py | grep -B 5 -A 10 "if\|else\|try\|except"
The most common boundary failure: a guard that protects the "happy path" but has an exception handler that silently re-enables the forbidden behavior.
For every new function introduced by this mission, trace what happens when it
fails. An error path that silently swallows exceptions and returns None or ""
is a silent failure candidate:
# Find all try/except blocks in new code
git diff <baseline_merge_commit>..HEAD -- src/ | grep -n "except\|raise\|log\."
The pattern to look for: try: <critical operation> except Exception: pass or
except Exception: return "". A function that returns an empty string when
something goes wrong (e.g., a function that builds a changelog block) will pass
its tests but produce wrong output in production. This is the "silent empty
result" anti-pattern.
A module with passing tests but no callers from live entry points is dead code. The most common post-merge defect in this codebase:
# For every new module/class introduced, verify at least one live caller
# Example: verify new module is imported from live command path
git diff <baseline_merge_commit>..HEAD -- src/ | grep "^+.*def \|^+.*class " | head -30
For each new public function or class:
grep -r "from.*<new_module> import\|import <new_module>" src/ --include="*.py"
Zero grep hits on a new module from src/ (not just from tests/) means the
feature is dead code, regardless of test coverage.
Each WP was reviewed in isolation. The mission review is the first opportunity to verify that all WPs integrate correctly. For missions with parallel WPs that modified the same subsystem:
# Find files touched by multiple WPs (use owned_files from WP frontmatter)
# These are integration risk points
git diff <baseline_merge_commit>..HEAD -- src/<shared_file>.py
Pay special attention to __init__.py exports — multiple parallel WPs
commonly introduce conflicts here that are resolved at merge time but may
drop an export.
For missions that touch CLI input handling, file I/O, subprocess execution, authentication, or network calls, perform a focused security pass.
# Find all subprocess calls introduced by this mission
git diff <baseline_merge_commit>..HEAD -- src/ | grep -n "subprocess\|shell=True\|Popen\|run("
# Find all file path operations
git diff <baseline_merge_commit>..HEAD -- src/ | grep -n "open(\|Path(\|os.path\|glob("
# Find all HTTP/network calls
git diff <baseline_merge_commit>..HEAD -- src/ | grep -n "httpx\|requests\|urllib\|socket"
# Find all authentication / credential operations
git diff <baseline_merge_commit>..HEAD -- src/ | grep -n "credentials\|token\|password\|secret\|auth"
For each hit, ask:
Path(user_input) that is not anchored to a known safe
base directory (e.g., tmp_path / user_input without validation) can be
exploited with ../../etc/passwd style inputs.shell=True with dynamic content is a command
injection risk. In a mission that modifies git operations, verify that branch
names and commit messages are passed as list arguments, not formatted into a
shell string.As of 2026-04-26 (mission stability-and-hygiene-hardening-2026-04-01KQ4ARB,
ADR architecture/2.x/adr/2026-04-26-3-e2e-hard-gate.md), mission review
runs four hard gates in order. A FAIL on any gate produces a FAIL verdict
in Step 9 unless the operator-exception path documented below is used.
cd <spec-kitty-repo>
SPEC_KITTY_ENABLE_SAAS_SYNC=1 <test-runner> tests/contract/ -v
Non-zero exit ⇒ HARD FAIL. There is no exception path for contract tests; a contract drift is a code defect by construction.
Record the exit code and any failing test names in your review report
under a new section ## Gate Results — Contract.
cd <spec-kitty-repo>
<test-runner> tests/architectural/ -v
Non-zero exit ⇒ HARD FAIL. Layer-rule, public-import, and package-boundary violations are code defects by construction.
Record the result under ## Gate Results — Architectural.
cd <spec-kitty-end-to-end-testing-repo>
SPEC_KITTY_ENABLE_SAAS_SYNC=1 <test-runner> scenarios/ -v
Non-zero exit ⇒ HARD FAIL unless an operator-exception artifact
exists at kitty-specs/<slug>/mission-exception.md and matches the
required schema (see below).
The four floor scenarios are:
dependent_wp_planning_lane.py — FR-001, FR-005, FR-038uninitialized_repo_fail_loud.py — FR-032, FR-039saas_sync_enabled.py — FR-040contract_drift_caught.py — FR-041Future missions touching cross-repo behavior MUST add scenarios that prove their behavior; verify any new mission-claimed cross-repo behavior has an e2e scenario, and FAIL the review if it does not.
Record the result under ## Gate Results — Cross-Repo E2E.
cat kitty-specs/<slug>/issue-matrix.md
For every row in the matrix table, assert that the verdict cell is one of:
fixedverified-already-fixeddeferred-with-followupAny empty verdict, any literal value unknown, or any verdict outside
that allow-list ⇒ HARD FAIL with the failing row(s) listed.
For every deferred-with-followup row, assert that the evidence_ref
cell names a follow-up issue or a precise narrower follow-up issue
title. A bare "deferred" with no follow-up handle is a FAIL.
Record the result under ## Gate Results — Issue Matrix.
The cross-repo e2e gate has one allowed exception path: the dev SaaS
endpoint or some other environmental dependency is genuinely unreachable
on the reviewer's machine, and the failing scenario is not actually
about a code defect. In that case the operator authors a
mission-exception.md artifact under kitty-specs/<slug>/. The skill
accepts the exception only if all of the following are present:
| Field | Required content |
|-------|------------------|
| **Operator**: | Human name and contact |
| **Date**: | ISO date the exception was granted |
| **Failing scenario**: | Single scenario file::test, NOT a wildcard |
| **Failing assertion**: | The exact assertion line that failed |
| Narrative section | Why the failure is environmental, not a code defect |
| ## Reproduction command | The exact command the operator ran |
| ## Follow-up | Either a follow-up issue link or a written retry commitment in a documented window |
Reject the exception artifact (and FAIL the review) if any of these are
missing. Reject the exception if the Failing scenario: field names
more than one scenario — a blanket waiver across the whole e2e suite is
not allowed by C-010.
Reject the exception if mission-exception.md covers a scenario that
the skill can verify is actually a code defect (e.g., a test-runner
traceback that is not network-related). Operator exceptions are for
environmental blockers, not for deferred bugs.
Full operator runbook for the exception path:
docs/migration/cross-repo-e2e-gate.md.
The mission review report (Step 9) MUST include a top-level section
## Gate Results containing one subsection per gate. Each subsection
records the command run, the exit code, and a one-paragraph rationale.
For Gate 3, if an exception was granted, link to the exception artifact
and quote the operator's narrative.
A mission whose Gate Results show any HARD FAIL with no documented exception receives the FAIL verdict in Step 9, regardless of how clean the FR coverage table looks.
Produce a single structured markdown report. The report is your deliverable. It must be self-contained — a reader who has not done any of the prior steps must be able to understand each finding from the report alone.
# Mission Review Report: <slug>
**Reviewer**: <your identity>
**Date**: <ISO date>
**Mission**: `<slug>` — <friendly_name>
**Baseline commit**: `<baseline_merge_commit>`
**HEAD at review**: `<git rev-parse HEAD>`
**WPs reviewed**: WP01..WPN
---
## Gate Results
### Gate 1 — Contract tests
- Command: `SPEC_KITTY_ENABLE_SAAS_SYNC=1 <test-runner> tests/contract/ -v`
- Exit code: <0 | non-zero>
- Result: PASS | FAIL
- Notes: <failing test names if any>
### Gate 2 — Architectural tests
- Command: `<test-runner> tests/architectural/ -v`
- Exit code: <0 | non-zero>
- Result: PASS | FAIL
- Notes: ...
### Gate 3 — Cross-repo E2E
- Command: `SPEC_KITTY_ENABLE_SAAS_SYNC=1 <test-runner> spec-kitty-end-to-end-testing/scenarios/ -v`
- Exit code: <0 | non-zero>
- Result: PASS | FAIL | EXCEPTION (with link to `kitty-specs/<slug>/mission-exception.md`)
- Notes: <failing scenarios; if EXCEPTION, quote the operator narrative>
### Gate 4 — Issue Matrix
- File: `kitty-specs/<slug>/issue-matrix.md`
- Rows: <count>
- Empty / `unknown` verdicts: <count, ideally 0>
- `deferred-with-followup` rows missing a follow-up handle: <count, ideally 0>
- Result: PASS | FAIL
A FAIL on any of Gates 1, 2, or 4 forces the Final Verdict to FAIL. A
FAIL on Gate 3 forces FAIL unless a valid `mission-exception.md` is
present.
---
## FR Coverage Matrix
| FR ID | Description (brief) | WP Owner | Test File(s) | Test Adequacy | Finding |
|-------|---------------------|----------|--------------|---------------|---------|
| FR-001 | ... | WP01 | tests/init/... | ADEQUATE | — |
| FR-NNN | ... | WPN | — | MISSING | [DRIFT-N] |
**Legend**: ADEQUATE = test constrains the required behavior; PARTIAL = test
exists but uses synthetic fixture that does not match production model;
MISSING = no test found; FALSE_POSITIVE = test passes even when implementation
is deleted.
---
## Drift Findings
### DRIFT-1: <Title>
**Type**: [NON-GOAL INVASION | LOCKED-DECISION VIOLATION | PUNTED-FR | NFR-MISS]
**Severity**: [CRITICAL | HIGH | MEDIUM | LOW]
**Spec reference**: <NG-N | D-N | FR-NNN | NFR-NNN>
**Evidence**:
- `git diff <baseline>..HEAD -- <file>` line NNN: `<exact code>`
- Or: absence of expected change in `<file>`
**Analysis**: <Why this is a drift, not a design choice. Cite the spec clause
that is violated or the expected change that is absent.>
---
## Risk Findings
### RISK-1: <Title>
**Type**: [BOUNDARY-CONDITION | ERROR-PATH | DEAD-CODE | CROSS-WP-INTEGRATION]
**Severity**: [CRITICAL | HIGH | MEDIUM | LOW]
**Location**: `<file>:<line_range>`
**Trigger condition**: <The specific input or state that activates this risk>
**Analysis**: <What happens when triggered, why it was not caught by existing
tests, and what the user-visible impact is.>
---
## Silent Failure Candidates
List every code path where the implementation returns a default value (empty
string, None, False) rather than raising on a condition that indicates
malfunction.
| Location | Condition | Silent result | Spec impact |
|----------|-----------|---------------|-------------|
| `src/.../foo.py:42` | `JSONL has no events` | returns `""` | FR-605: changelog block always empty |
---
## Security Notes
| Finding | Location | Risk class | Recommendation |
|---------|----------|------------|----------------|
| ... | ... | [PATH-TRAVERSAL | SHELL-INJECTION | LOCK-TOCTOU | UNBOUND-HTTP | CREDENTIAL-RACE] | ... |
---
## Final Verdict
**PASS** / **PASS WITH NOTES** / **FAIL**
### Verdict rationale
[One paragraph. State whether all FRs are adequately covered, whether any
locked decisions were violated, whether any release-gating NFRs missed their
threshold, whether any security findings are blocking. If FAIL, name the
specific findings that block release.]
### Open items (non-blocking)
[List findings that are not blocking release but should be addressed in a
follow-up mission.]
## Retrospective Reminder
The canonical post-merge sequence is: **mission review → author or verify retrospective
(`retrospect create`) → surface findings (`summary` aggregates; `synthesize` reviews proposals)**.
Under default 3.2.0 policy, the `retrospective.yaml` record is authored automatically during
merge. Verify it exists:
```bash
cat .kittify/missions/$(jq -r .mission_id kitty-specs/<slug>/meta.json)/retrospective.yaml
If the file is absent (older mission, or generation failed), author it:
spec-kitty retrospect create --mission <slug>
Then surface findings:
spec-kitty retrospect summary — cross-mission aggregation (read-only; does NOT author)spec-kitty agent retrospect synthesize --mission <slug> — inspect proposals (dry-run by default)spec-kitty agent retrospect synthesize --mission <slug> --apply — apply proposals (mutates)If the record is absent and retrospect create fails, escalate — the terminus facilitator
either did not run or was skipped without a recorded reason. Check status.events.jsonl for
RetrospectiveCaptureFailed events and their remediation_hint field.
---
## Key Rules
1. **You do not fix anything.** Your role is to document. If you find a bug,
describe it precisely and note its severity. Do not patch code, do not amend
tests, do not modify spec artifacts.
2. **Every finding must cite its evidence.** "I believe the test is inadequate"
is not a finding. "Test at `tests/foo.py:42` creates a synthetic WP with
`status: done` in frontmatter; real missions store status in
`status.events.jsonl`; the test would pass if the implementation code were
deleted" is a finding.
3. **Absence is evidence.** A file that the spec required to change but shows
no diff is a finding. An FR that has no test hit is a finding. A locked
decision whose violation was "approved by arbiter" without a rationale is
a finding.
4. **Read the invisible holes.** The spec states what the system must do. It
does not state every assumption it relies on. The reviewer's job includes
surfacing those assumptions and checking whether the implementation honored
them even though they were never written down.
5. **The review history is signal.** WPs with 3+ rejection cycles or arbiter
overrides are high-priority targets. The disagreement that caused those
cycles often reveals an unstated requirement or a design ambiguity that the
implementation resolved in a way the spec author did not intend.
6. **Tests that pass are not proof.** A test that passes because it uses a
synthetic fixture, not because the code is correct, is a liability, not an
asset. The question is always: does this test constrain the actual runtime
behavior?
7. **Security findings are never "low priority" for release-gating missions.**
If the mission modifies auth, file locking, subprocess execution, or HTTP
calls, any security finding that could affect a user in normal operation is
at minimum MEDIUM severity.
8. **The verdict is binary for blocking findings.** PASS WITH NOTES is for
non-blocking findings only. If any CRITICAL or HIGH finding exists that is
not already documented as an accepted known issue, the verdict is FAIL.
9. **This skill does not use any generated review prompt.** Unlike
`spec-kitty-runtime-review`, which defers to the CLI-generated prompt as
the source of truth, this skill produces original analysis. The spec and
contracts are the source of truth.
10. **Always include the retrospective reminder.** The report must end with a
`## Retrospective Reminder` section that tells the operator the
`retrospective.yaml` was authored at the runtime terminus, and that
`spec-kitty retrospect summary` plus
`spec-kitty agent retrospect synthesize --mission <slug>` review the
captured record and apply staged proposals (dry-run by default). If the
record is missing, escalate rather than proceed.
tools
Operate Spec Kitty tracker workflows, tracker service discovery, binding, hosted routing, and tracker recovery.
tools
Operate Spec Kitty team sync, hosted SaaS sync, offline queue, diagnostics, and recovery flows.
tools
Operate Spec Kitty connector integrations and route connector work across tracker, sync, SaaS, and external services.
tools
Handle Spec Kitty team authentication, hosted credentials, account selection, and auth-related recovery.