skills/reflect/SKILL.md
Session retrospective, operator coaching, harness postmortem, codification, and outer-loop cycle critique. Turns evidence into hooks, rules, skills, backlog mutations, or explicit non-actions. Use when: "done", "wrap up", "what did we learn", "retro", "calibrate", "prompt better", "teach me from this session", "reflect on cycle", post-/flywheel critique. Trigger: /reflect, /retro, /calibrate, /reflect checkpoint <topic>, /reflect cycle <cycle-ulid>.
npx skillsauth add phrazzld/agent-skills reflectInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Structured reflection that improves both the harness and the operator.
When roster receipts exist, include .harness-kit/traces/ delegation receipts
in the evidence set when they are relevant. Reflection should convert
provider-lane results and failure modes into backlog, harness, or coaching
outputs without inventing hidden rankings.
delegate on judgment per the shared Roster contract: native subagents
by default; add cross-model critics, roster providers, or sprite lanes
(/sprites) only when they answer a distinct question. See
harnesses/shared/AGENTS.md (Roster).
Local lane guidance: Use lanes to surface independent failure interpretations and improvement proposals; the lead owns synthesis and codification choices.
Every finding becomes one of three things:
When .harness-kit/work/ledger.jsonl is available, /reflect consumes the
latest events for the active backlog/branch plus trace refs and delegation
receipts. It calls cargo run --locked -p harness-kit-checks -- work-ledger append with phase_started at
retro start, next_action_changed when follow-up backlog or harness proposals
are emitted, and phase_completed when the reflection packet is complete.
Follow-up proposals are evidence refs, not hidden chat-only state.
| Mode | Intent | Reference |
|------|--------|-----------|
| distill (default) | End-of-session retrospective -> codified artifacts + operator coaching | references/distill.md |
| calibrate | Mid-session harness postmortem — fix the harness before the code | references/calibrate.md |
| coach | Deep dive on prompt quality, technical specificity, and concept building | references/coach.md |
| checkpoint | Opt-in teach-back checkpoint with restatement, verdict, gaps, and gate artifact | references/checkpoint.md |
| prompt-debt | Promote repeated corrections and repeated workflow patterns into one codification proposal | references/prompt-debt.md |
| tune-repo | Refresh context artifacts, detect drift, update repo guidance | references/tune-repo.md |
| append | Append issue-scoped retro notes for /groom to consume later | references/retro-format.md |
| cycle | Bounded end-of-ship retrospective invoked by /ship — emit backlog mutations, harness-tuning proposals, and a cycle summary for the caller to apply | references/cycle.md |
distill, prompt-debt, and cycle may emit a learning packet for
/harness-engineering apply. Reflect emits proposals; it does not apply them.
If the first argument matches a mode name, route to that reference.
If no mode is provided, run distill.
When a mode emits a learning packet, use this machine-consumable shape:
deliver | distill | cycle | prompt-debt/harness-engineering apply
create | update | delete | move | backlog-create |
gate-add | eval-addtype | lint | hook | test | ci |
skill | agents | memoryInterpret natural-language requests as:
Comprehension-required: <topic> -> checkpointcoachprompt-debtcalibratetune-repo/ship (or transitively from /flywheel, which composes /ship)
-> cycleReflection must separate three classes of failure:
Do not dump harness failures onto the user. If the repo, docs, or available context already contained the answer, that is not a prompt-quality critique.
Even in distill, inspect both lanes:
System codification is mandatory. Operator coaching is mandatory to assess, but only mandatory to emit when there is concrete, high-leverage feedback. Otherwise say so explicitly instead of manufacturing generic advice.
Use coach when the user wants the operator lane expanded into a deeper lesson.
When encoding knowledge, always target the highest-leverage mechanism:
Type system > Lint rule > Hook > Test > CI > Skill/reference > AGENTS.md > Memory
Prompt debt is a repeated human correction, repeated request, or repeated decision pattern that should become a durable harness artifact instead of remaining chat-only advice. Use available local surfaces only: repo-local reflect notes, review scores, delegation receipts, traces, session summaries, and durable memory notes. Chronicle-derived context may inform the pattern, but do not quote private personal detail.
When .groom/review-scores.ndjson exists, run
cargo run --locked -p harness-kit-checks -- review-score-trends before proposing
skill changes. Treat 5+ score entries as enough for a trend; below that, report
the count and avoid a tuning claim. If the analyzer names a dimension regression
or high false-positive rate, propose a concrete skill/reference edit using the
codification hierarchy rather than a generic observation.
Promote a pattern when it appears at least twice across sessions, or once when it prevented a shipped regression, runaway spend, data loss, or client-facing artifact error. Emit one highest-leverage proposal by default:
## Prompt Debt
- Pattern:
- Evidence count:
- Safe evidence snippets:
- Recommended target:
- Acceptance criteria:
- Residual risk:
Apply the codification hierarchy above. Prefer type, lint, hook, test, or CI coverage before skill prose; use AGENTS.md for always-on routing; use memory only for preference-level defaults that cannot be enforced.
cycle is a bounded invocation: /ship calls it at the end of the
final-mile pipeline to capture learnings from the just-shipped ticket.
/flywheel triggers it transitively by composing /ship. When invoked as
cycle, reflect gains two privileges the other modes lack:
backlog.d/ (never
backlog.d/_done/). Every proposal must cite an evidence ref from the
cycle (commit, diff hunk, receipt path, log line)./harness-engineering apply for validation and a harness
branch.All other modes are read-only against backlog.d/ and the harness. If
cycle cannot cite evidence for a mutation, downgrade it to a finding and
let a human decide.
Triggered as /reflect cycle (aliases: /reflect --cycle <cycle-id>).
The caller — normally /ship — passes this input packet:
branch: name of the just-shipped feature branch (pre-merge).merged_sha: squash commit SHA now on master/main.closed_backlog_ids: list of IDs closed in this cycle (the closing set
from /ship's trailer scan).referenced_backlog_ids (optional): Refs-backlog IDs noted but not
closed.A cycle-id identifies the retro artifact; derive it from merged_sha
short form when the caller does not supply one.
Three required categories plus one optional prompt-debt category. The structured categories must be cleanly separable so the caller can apply them under different policies.
Backlog mutations (structured, machine-consumable). For each:
create | edit | reprioritize | deletebacklog.d/<id>-*.md targetcreate, unified diff for edit, new
priority for reprioritize, justification for delete/harness-engineering apply or a direct backlog edit, but reflect does not
stage, commit, or push these files itself.Harness-tuning proposals (structured, machine-consumable). For each:
skills/, agents/, harnesses/,
AGENTS.md, CLAUDE.md, or a hook script/harness-engineering apply, which creates a harness branch. A cycle
run that mutates harness files on the current branch is a bug.Prompt-debt proposal (optional, structured). Include at most one by default, only when repeated corrections or high-severity prompt patterns are visible in cycle evidence:
Cycle summary (human-readable narrative). What shipped, what was
learned, what went well, what went poorly. Also written to the
standard retro location (.groom/retro/<primary-id>.md or the
.harness-kit/reflect/<cycle-id>/ receipts dir, matching whatever
convention the invoking repo already uses).
Learning packet (machine-consumable). Use the shared Learning Packet Schema above.
/harness-engineering apply validates and routes to a harness branch. This
is a hard cross-skill invariant also asserted in ship/SKILL.md.git add / git commit on behalf of the caller.distill,
calibrate, coach, tune-repo, and append remain usable without
cycle context — cycle is additive, not a replacement.See references/cycle.md for judgment rules (consolidate vs split,
when to escalate to a harness branch, evidence standards).
Run cargo run --locked -p harness-kit-checks -- reflect-checkpoint --self-test to prove the
checkpoint validator rejects missing restatements, invalid verdicts, and raw
private content.
tools
Enumerates the peer AI agent CLIs installed on this machine (codex, claude, pi, opencode, cursor-agent, grok, agy, hermes, thinktank) and how to invoke each headlessly. A capability map, not a quota: useful for fresh-context adversarial review on a different model family, second opinions, competing attempts, and wide benches. Use when: "ask codex", "ask another model", "second opinion", "cross-model review", "what AI tools do I have", "other agents", "different model family", "adversarial critique from another provider". Trigger: /roster.
development
Run lane cards on Fly Sprites: remote, isolated, scale-to-zero sandboxes for heavy or parallel agent work. Golden-checkpoint provisioning so lanes start on a ready sprite with zero setup tokens. Use when: "run this on a sprite", "remote lane", "offload to a sandbox", "dispatch to sprites", "bake a sprite", "sprite fleet", heavy/long-running/parallel sub-agent work that should not run on this machine. Trigger: /sprites, /sprite-lane.
testing
Compose and launch roster-backed specialist lanes with prompt-native lane cards and receipts. Use when: "dispatch agents", "use subagents", "compose a team", "run provider lanes", "make lane cards". Trigger: /dispatch, /subagents, /lanes.
tools
Fast session-start repository orientation from live local evidence. Use when: "orient yourself", "start of session", "new session", "where are we", "catch me up before acting", "what should I do next", after compaction, after switching worktrees, or before choosing a Harness Kit workflow. Trigger: /orient, /ground, /session-start.