skills/reflect/SKILL.md
Session retrospective, operator coaching, harness postmortem, codification, and outer-loop cycle critique. Turns evidence into hooks, rules, skills, backlog mutations, or explicit non-actions. Use when: "done", "wrap up", "what did we learn", "retro", "calibrate", "prompt better", "teach me from this session", "reflect on cycle", post-/flywheel critique. Trigger: /reflect, /retro, /calibrate, /reflect checkpoint <topic>, /reflect cycle <cycle-ulid>.
npx skillsauth add phrazzld/spellbook reflectInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Structured reflection that improves both the harness and the operator.
When roster receipts exist, include .harness-kit/traces/ delegation receipts
in the evidence set when they are relevant. Reflection should convert
provider-lane results and failure modes into backlog, harness, or coaching
outputs without inventing hidden rankings.
Delegation floor applies: probe the roster first; dispatch two or more
providers for substantive work; direct solo only for mechanical, emergency,
user-forbidden, or fewer-than-two-providers cases. See
harnesses/shared/AGENTS.md (Roster).
Local lane guidance: Use lanes to surface independent failure interpretations and improvement proposals; the lead owns synthesis and codification choices.
Every finding becomes one of three things:
When .harness-kit/work/ledger.jsonl is available, /reflect consumes the
latest events for the active backlog/branch plus trace refs and delegation
receipts. It calls scripts/work-ledger.py append with phase_started at
retro start, next_action_changed when follow-up backlog or harness proposals
are emitted, and phase_completed when the reflection packet is complete.
Follow-up proposals are evidence refs, not hidden chat-only state.
| Mode | Intent | Reference |
|------|--------|-----------|
| distill (default) | End-of-session retrospective -> codified artifacts + operator coaching | references/distill.md |
| calibrate | Mid-session harness postmortem — fix the harness before the code | references/calibrate.md |
| coach | Deep dive on prompt quality, technical specificity, and concept building | references/coach.md |
| checkpoint | Opt-in teach-back checkpoint with restatement, verdict, gaps, and gate artifact | references/checkpoint.md |
| prompt-debt | Promote repeated corrections and repeated workflow patterns into one codification proposal | references/prompt-debt.md |
| tune-repo | Refresh context artifacts, detect drift, update repo guidance | references/tune-repo.md |
| append | Append issue-scoped retro notes for /groom to consume later | references/retro-format.md |
| cycle | Bounded end-of-ship retrospective invoked by /ship — emit backlog mutations, harness-tuning proposals, and a cycle summary for the caller to apply | references/cycle.md |
If the first argument matches a mode name, route to that reference.
If no mode is provided, run distill.
Interpret natural-language requests as:
Comprehension-required: <topic> -> checkpointcoachprompt-debtcalibratetune-repo/ship (or transitively from /flywheel, which composes /ship)
-> cycleReflection must separate three classes of failure:
Do not dump harness failures onto the user. If the repo, docs, or available context already contained the answer, that is not a prompt-quality critique.
Even in distill, inspect both lanes:
System codification is mandatory. Operator coaching is mandatory to assess, but only mandatory to emit when there is concrete, high-leverage feedback. Otherwise say so explicitly instead of manufacturing generic advice.
Use coach when the user wants the operator lane expanded into a deeper lesson.
When encoding knowledge, always target the highest-leverage mechanism:
Type system > Lint rule > Hook > Test > CI > Skill/reference > AGENTS.md > Memory
Prompt debt is a repeated human correction, repeated request, or repeated decision pattern that should become a durable harness artifact instead of remaining chat-only advice. Use available local surfaces only: repo-local reflect notes, review scores, delegation receipts, traces, session summaries, and durable memory notes. Chronicle-derived context may inform the pattern, but do not quote private personal detail.
When .groom/review-scores.ndjson exists and
scripts/review-score-trends.py is available, run the analyzer before proposing
skill changes. Treat 5+ score entries as enough for a trend; below that, report
the count and avoid a tuning claim. If the analyzer names a dimension regression
or high false-positive rate, propose a concrete skill/reference edit using the
codification hierarchy rather than a generic observation.
Promote a pattern when it appears at least twice across sessions, or once when it prevented a shipped regression, runaway spend, data loss, or client-facing artifact error. Emit one highest-leverage proposal by default:
## Prompt Debt
- Pattern:
- Evidence count:
- Safe evidence snippets:
- Recommended target:
- Acceptance criteria:
- Residual risk:
Apply the codification hierarchy above. Prefer type, lint, hook, test, or CI coverage before skill prose; use AGENTS.md for always-on routing; use memory only for preference-level defaults that cannot be enforced.
cycle is a bounded invocation: /ship calls it at the end of the
final-mile pipeline to capture learnings from the just-shipped ticket.
/flywheel triggers it transitively by composing /ship. When invoked as
cycle, reflect gains two privileges the other modes lack:
backlog.d/ (never
backlog.d/_done/). Every proposal must cite an evidence ref from the
cycle (commit, diff hunk, receipt path, log line).All other modes are read-only against backlog.d/ and the harness. If
cycle cannot cite evidence for a mutation, downgrade it to a finding and
let a human decide.
Triggered as /reflect cycle (aliases: /reflect --cycle <cycle-id>).
The caller — normally /ship — passes this input packet:
branch: name of the just-shipped feature branch (pre-merge).merged_sha: squash commit SHA now on master/main.closed_backlog_ids: list of IDs closed in this cycle (the closing set
from /ship's trailer scan).referenced_backlog_ids (optional): Refs-backlog IDs noted but not
closed.A cycle-id identifies the retro artifact; derive it from merged_sha
short form when the caller does not supply one.
Three required categories plus one optional prompt-debt category. The structured categories must be cleanly separable so the caller can apply them under different policies.
Backlog mutations (structured, machine-consumable). For each:
create | edit | reprioritize | deletebacklog.d/<id>-*.md targetcreate, unified diff for edit, new
priority for reprioritize, justification for delete/ship) applies them to master
via a follow-up commit and owns commit hygiene. Reflect does not stage,
commit, or push these files itself.Harness-tuning proposals (structured, machine-consumable). For each:
skills/, agents/, harnesses/,
AGENTS.md, CLAUDE.md, or a hook script/ship uses harness/reflect-outputs). A cycle
run that mutates harness files on the current branch is a bug.Prompt-debt proposal (optional, structured). Include at most one by default, only when repeated corrections or high-severity prompt patterns are visible in cycle evidence:
Cycle summary (human-readable narrative). What shipped, what was
learned, what went well, what went poorly. Also written to the
standard retro location (.groom/retro/<primary-id>.md or the
.harness-kit/reflect/<cycle-id>/ receipts dir, matching whatever
convention the invoking repo already uses).
ship/SKILL.md.git add / git commit on behalf of the caller.distill,
calibrate, coach, tune-repo, and append remain usable without
cycle context — cycle is additive, not a replacement.See references/cycle.md for judgment rules (consolidate vs split,
when to escalate to a harness branch, evidence standards).
development
Lightweight evidence-backed retro and catch-up reports for a current repo, branch, PR, backlog slice, or recent agent session. Use when the user asks for a debrief, catch me up, what changed, why it matters, product implications, end-user implications, developer experience implications, current app state, backlog state, workspace state, alternatives considered, or context rebuild after losing the thread. Trigger: /debrief.
testing
Capture agent-session work records as local JSONL audit evidence. Links a backlog/spec, branch, commits, review verdicts, QA/demo evidence, transcript refs, and shipped ref without storing raw private transcripts. Use when: "trace this work", "write work record", "agent session trace", "journal this delivery", "link transcript evidence". Trigger: /trace, /journal.
data-ai
Turn proven agent-session patterns into first-party Harness Kit skills. Use when: "skillify this conversation", "make this into a skill", "generate a skill from current transcript", "extract reusable workflow". Trigger: /skillify.
testing
Run one targeted, read-only architecture or quality critique through a named lens from the shared rubric. Use when: "critique this module", "run an Ousterhout pass", "lens critique", "architecture critique". Trigger: /critique.