skills/reflect/SKILL.md
Reflect on recent work and propose improvements if patterns are noticed.
npx skillsauth add gtapps/claude-code-hermit reflectInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Pause and think about your recent work.
This skill is silent by default. Only notify the operator (per the channel policy in CLAUDE.md § Operator Notification) if reflect produces an outcome: a proposal candidate, a micro-approval, a resolved proposal, a graduated sub-threshold observation, or a cost spike.
Read SHELL.md for current context
Read last 20 lines of cost-log.jsonl. Compute today's total and the 7-day median. If today's total > 2× the 7-day median (and both are non-zero), record the spike to project memory as a sub-threshold observation with pattern cost_spike: $X.XX vs 7d median $Y.YY and today's session_id — it becomes input to later reflects and may graduate via the recurrence rule.
2b. Compute phase — gates adapt to hermit age so cold-start installs produce visible output without eroding mature-hermit rigor.
counters.since from state/reflection-state.json (set once at hatch, never rewritten). If missing or unparseable → default $PHASE = adult and continue. Never block.age_days = whole days between counters.since and now.$PHASE table (age is monotonic → no hysteresis):
newborn — age_days < 3juvenile — 3 ≤ age_days < 14adult — age_days ≥ 14$PHASE for the rest of this run; it gates recurrence (Three-Condition Rule #1), sub-threshold surfacing (Outcomes), and the Progress Log annotation.Scan proposals/ for existing proposals (dedup, stale check, feedback loop). Parse metadata from YAML frontmatter if present (file starts with ---). Fall back to parsing bullet-point metadata (**Status:**, **Source:**, etc.) for pre-Observatory proposals. Also tail the last 100 lines of state/proposal-metrics.jsonl and count responded events by action (accept / defer / dismiss) — the dismissal ratio feeds the operator-value self-check below.
Resolution Check — check whether any accepted proposals can be marked resolved. Cap: check up to 5 per reflect cycle, round-robin.
a. Read state/reflection-state.json → last_resolution_check (last PROP-NNN checked, or null if first run).
b. Read all proposals with status: accepted. Sort by accepted_date ascending. Resume from the proposal after last_resolution_check, wrapping around. Take up to 5.
c. If the accepted list from step b is empty, skip to step f.
d. For each proposal: read its title and Evidence section to understand the original pattern.
Glob .claude-code-hermit/sessions/S-*-REPORT.md, sort descending, take the 3 most recent.
e. If the pattern is absent from all 3 checked sessions — apply the cadence-aware resolution rule:
Compute original cadence:
related_sessions list from frontmatter.S-NNN, read date: from sessions/S-NNN-REPORT.md frontmatter.original_cadence_days = max(date) - min(date) in whole days. Single session → 0.related_sessions is empty → treat as sparse (conservative fallback).Frequent pattern (original_cadence_days ≤ 14) — auto-resolve if ≥ 14 days have elapsed since accepted_date:
status: resolved, resolved_date: <now ISO>.resolved event:
node ${CLAUDE_PLUGIN_ROOT}/scripts/append-metrics.js .claude-code-hermit/state/proposal-metrics.jsonl '{"ts":"<now ISO>","type":"resolved","proposal_id":"PROP-NNN"}'
Sparse pattern (original_cadence_days > 14) — never auto-resolve. Surface for operator confirmation if elapsed ≥ 2 × original_cadence_days since accepted_date:
state/reflection-state.json → last_sparse_nudge.<PROP-NNN>. If present and fewer than 7 days have elapsed since that date, skip (prevents daily re-nudge noise).PROP-NNN appears resolved (pattern absent 3/3 recent sessions, original cadence Nd, Xd since accept). Run /claude-code-hermit:proposal-act resolve PROP-NNN to confirm.
"last_sparse_nudge": {"<PROP-NNN>": "<now ISO>"} in the State Update payload below. The update script merges under the top-level last_sparse_nudge key.If the pattern is absent but the elapsed guard is not yet met (frequent: < 14d, sparse: < 2×cadence), skip and revisit next cycle.
f. Note the last PROP-NNN checked (or null if batch was empty). Include as last_resolution_check in the final state write in the State Update section below — do NOT write reflection-state.json here.
Now reflect — ultrathink — using your memory and the context above:
responded event counts from step 3: a high dismiss ratio, proposals stacking in deferred, or compiled/brief outputs that go uncited in subsequent sessions are signals that some output is noise. If the signal is strong, treat it like the routine-silence check below and consider a Tier 1 micro-proposal to pare back the offending output.Only create a proposal if all three are true:
newborn: 1+ session acceptable for Tier 1 only — use Evidence Source: current-session and cite Sessions: current when the pattern is present only in the live SHELL.md (judge returns ACCEPT (current-session)). Tier 2/3 still require 2+ sessions.juvenile / adult: 2+ sessions (baseline — observed more than once, across sessions).If any of the three cannot be stated concretely, do not create the proposal. Sub-threshold observations (interesting but failing the rule) are recorded to project memory so they can graduate on later recurrence — see the Outcomes section.
If SHELL.md status is idle — think broader:
Should any recurring check be added to HEARTBEAT.md?
Is there a preference or constraint missing from OPERATOR.md?
Would a sub-agent improve a type of work that keeps coming up?
Would a skill formalize a workflow you keep repeating?
Is a manual request repeating on a schedule? (e.g., "operator asked for dependency check 3 of last 4 Mondays")
If so: create a proposal with Type: routine and a ## Config block containing the routine JSON:
## Config
{"id":"weekly-deps","schedule":"0 9 * * 1","skill":"claude-code-hermit:session-start --task 'dependency audit'","enabled":true}
When accepted via proposal-act, this JSON is parsed and added to config.json routines automatically.
Is a routine firing repeatedly with no visible downstream effect? Read the tail of state/routine-metrics.jsonl (last 200 lines), group fired events by routine_id, and cross-reference with the last 3 session reports (sessions/S-*-REPORT.md). If a routine has ≥5 fires in the last 14 days and no session report cites its routine_id or skill output as producing findings, decisions, or follow-ups — apply the Three-Condition Rule. If all three conditions hold:
enabled: false (disable) via a Type: routine proposal reusing the existing id. proposal-act upserts by id, so no delete path is needed.schedule if the routine is valuable but mis-timed.disable → Tier 1 (micro-approval). Re-time → Tier 1. Both are fully reversible. Operators can clean up disabled entries any time via /hermit-settings routines.## Config
{"id":"weekly-deps","schedule":"0 9 * * 1","skill":"claude-code-hermit:session-start --task 'dependency audit'","enabled":false}
Check whether any skill, agent, or hook is underperforming.
Skills:
Agents: read state/reflection-state.json cumulative counters (they accumulate since the since timestamp). Flag if reflection-judge shows judge_suppress dominating judge_accept (rough threshold: suppress count > 2× accept count, with at least 5 total verdicts since since) — the gate may be too strict and killing legitimate candidates. proposal-triage has no verdict counters today; treat it as a known gap and skip unless qualitative evidence (e.g., a recent DUPLICATE verdict that was actually novel) is visible in SHELL.md Findings.
Hooks: out of scope here — there is no hook execution telemetry. Document as a known gap if hook misbehavior is suspected; do not try to infer from side-effects.
Signal ladder (same for all three):
/claude-code-hermit:proposal-create with the evidence (subject to Three-Condition Rule)./claude-code-hermit:proposal-create with the evidence and include a ## Skill Improvement section (or ## Agent Improvement) listing the component name, observed failures, and suggested eval criteria. When the proposal is accepted via proposal-act, use /skill-creator eval and /skill-creator improve to implement the changes. If /skill-creator is not available, apply the changes to the component's definition file directly.For any candidate with Evidence Source: current-session, reflect must not add or rewrite evidence-bearing lines in ## Findings or ## Blockers of SHELL.md before reflection-judge runs. The judge validates against pre-existing session content; injecting the pattern text immediately before the judge reads it would make the system self-certifying.
Exempt (always allowed, any time): the mandatory ## Progress Log append (see § Progress Log Entry) and housekeeping notes that do not describe the candidate's pattern (e.g. skipped-scheduled-check lines, resolved-proposal notes).
If the pattern is only visible to reflect via inference (cost log, token counters, timing), the candidate is not eligible for Evidence Source: current-session in that run. Keep it sub-threshold until it recurs and can be cited from independent historical evidence (Evidence Source: archived-session).
Before acting on any proposal candidate, delegate to claude-code-hermit:reflection-judge.
Pass each candidate as:
Candidate: <title>
Tier: <1|2|3>
Evidence Source: archived-session | current-session | scheduled-check/<id> | operator-request
Evidence: <summary>
Sessions: <S-001, S-002, ...> (or "none")
Evidence Source: defaults to archived-session if omitted. Plugin-check candidates use Evidence Source: scheduled-check/<id> with Sessions: none. Newborn Tier-1 candidates from live SHELL.md evidence use Evidence Source: current-session with Sessions: current.
scheduled-check/<id> and operator-request share the same bypass policy at every gate (skip recurrence, enforce consequence + actionability). They are kept distinct on purpose: scheduled-check/<id> carries the check identifier for telemetry and debugging; operator-request marks human-initiated flows (e.g. baseline audits in session-start). Future routing (e.g. KAIROS) will read them as different provenance classes. Do not collapse them into one value.
no-sessions, note the candidate in SHELL.md Findings for future revisit. Otherwise drop silently.Only act on ACCEPT and DOWNGRADE verdicts.
After reflecting and validating with claude-code-hermit:reflection-judge, choose exactly one outcome per observation:
claude-code-hermit:proposal-triage first (see below), then queue micro-approval in state/micro-proposals.jsonclaude-code-hermit:proposal-triage first (see below), then call /claude-code-hermit:proposal-createSub-threshold observations (interesting but failing the Three-Condition Rule — typically single-occurrence) do not surface to the operator in steady state. Record them to project memory with a short pattern label and today's session_id so they can graduate via recurrence on a later reflect. Do not generate observations for their own sake, and do not surface them before they graduate.
Reflect-generated inferences (cost spikes, token-count shapes, timing patterns) never use bypass Evidence Sources (scheduled-check/* or operator-request). They remain sub-threshold observations recorded to project memory and graduate only by genuine recurrence across sessions, at which point they can be cited as Evidence Source: archived-session or current-session like any other session-grounded observation.
Phase-aware surfacing exception:
newborn: also log each sub-threshold observation inline to SHELL.md Findings as Noticed: <pattern> (single line, no ceremony). Gives the operator early signal that reflect is watching while recurrence data accumulates.juvenile: emit a weekly digest instead of per-observation lines. Read last_digest_at from state/reflection-state.json (top-level, may be absent). If absent or older than 7 days, write a single Noticed (digest): <N> observations — <top 3 pattern labels> line to SHELL.md Findings, and include "last_digest_at": "<now ISO>" in the State Update payload below so the update script persists it.adult: silent (baseline).Review past dismissed and deferred proposals. Avoid re-suggesting recently dismissed ideas. If significantly more evidence has accumulated since a dismissal, it may be worth revisiting.
Before creating a proposal or acting silently, classify every proposal candidate into a tier:
Tier 1 — reversible, routine, low-scope: queue micro-approval, do NOT create PROP-NNN. Example: "For 3 weeks I've added the same 5 hashtags manually. Proposing to automate that step."
Tier 2 — meaningful but non-critical: queue micro-approval, create PROP-NNN only after operator says yes. Example: "Morning brief is consistently ignored on weekdays before 9am. Proposing to shift it to 9:30am."
Tier 3 — safety-critical, irreversible, or cross-hermit scope: create PROP-NNN immediately via /claude-code-hermit:proposal-create, skip micro-approval entirely.
Example: "Operator's gate automation script has an error that could trigger physical actuators unexpectedly. Requires explicit review before any change."
Before queuing a micro-approval or calling proposal-create, call claude-code-hermit:proposal-triage. Pass Evidence Source: when known:
Title: <title>
Evidence Source: <value from the candidate, or omit to default to archived-session>
Evidence: <one-paragraph evidence summary>
CREATE — proceedDUPLICATE:<PROP-ID> — link to existing proposal in SHELL.md Findings instead, do not createSUPPRESS — drop silentlyEvery micro-proposal question must include: [observed pattern + duration] + [consequence] + [exact proposed change] + "Yes / No"
Do not queue vague questions like "Found a pattern. Want me to improve it?" — all three components must be present.
Dedup: do not re-append the same candidate if an entry with the same title/id already exists in pending.
MP-YYYYMMDD-N where N increments within the same day (0, 1, 2). Check existing micro-queued events in proposal-metrics.jsonl for today to determine N.state/micro-proposals.json. Append a new entry to pending with id: "MP-YYYYMMDD-N", tier: <1|2>, status: "pending", follow_up_count: 0, ts: "<now ISO>", question: "<full question text>". Write the file.micro-queued event to proposal-metrics.jsonl via append-metrics.js: {"ts":"<now ISO>","type":"micro-queued","micro_id":"MP-YYYYMMDD-N","tier":1,"question":"<full question text>"}MP-YYYYMMDD-N (tier <N>): <question> — Reply "MP-YYYYMMDD-N yes" or "MP-YYYYMMDD-N no" (bare yes/no accepted when only one entry is pending).After each reflection run, call update-reflection-state.js with the run's verdict counts:
node ${CLAUDE_PLUGIN_ROOT}/scripts/update-reflection-state.js \
.claude-code-hermit/state/reflection-state.json \
'{"last_resolution_check":"<last-PROP-NNN-or-null>","ran_with_candidates":<true|false>,"judge_accept":<N>,"judge_downgrade":<N>,"judge_suppress":<N>,"proposals_created":<N>,"micro_proposals_queued":<N>}'
Include "last_digest_at":"<now ISO>" in the payload only when a juvenile digest fired in this run (see Outcomes → phase-aware surfacing). Omit otherwise — the script preserves the prior value.
Include "last_sparse_nudge":{"<PROP-NNN>":"<now ISO>"} when a sparse-pattern nudge was emitted this run (step 4e). The script merges the provided map into the existing last_sparse_nudge top-level key. Omit if no sparse nudge was emitted — the script preserves the prior map.
The script handles: counter increments, last_reflection/last_run_at timestamps, missing-counters fallback, since preservation, last_digest_at passthrough, last_sparse_nudge merge, and atomic write. It always exits 0 — if the write fails it logs one line to stderr and continues. Counters are diagnostic, not audit-grade — a missed increment is acceptable.
On every reflect run, including empty ones, append one line to SHELL.md ## Progress Log:
[HH:MM] reflect (<phase>) — N candidates; verdicts: accept=A downgrade=D suppress=S; outcomes: <list or "none">
When any suppressions occurred, append a compact suffix:
[HH:MM] reflect (<phase>) — N candidates; verdicts: accept=A downgrade=D suppress=S; outcomes: <list or "none">; suppressed: [<slug>: <code>, ...]
Format per suppression: <candidate-title-slug>: <code> where <code> is the canonical code from the judge or triage verdict (no-evidence, no-sessions, weak-recurrence, weak-consequence, not-actionable). Cap at 3 entries with +N more overflow. Omit the suppressed: suffix entirely when suppress=0.
<phase> is one of newborn / juvenile / adult (from step 2b). If phase detection fell back to adult due to missing counters.since, annotate as adult silently — no operator-facing distinction.
This is the audit trail. The silent-by-default rule at the top governs operator pings only — the log line always goes in.
tools
Presence history & tracker-health report — current home/away state, reliability, recent arrival/departure transitions, and activity patterns for person/device_tracker entities. Use when the operator asks about presence history or when a presence-dependent automation (locks, alarm, vacuum, climate) misbehaves.
development
Evening house brief — end-of-day security check, device status, and energy snapshot. Runs as a daily routine at 22:30 or on demand.
tools
Browse and explain the hermit's Home Assistant automations — list by topic, filter by keyword with plain-language YAML explanations, or sort by last-fired. Read-only. Use when the operator asks "what automations do I have / what does this one do / which haven't fired."
tools
On-demand HA-voice brainstorm — reads entity inventory, automation/script listings, and operator intent to surface at most 2 capability-gap ideas, each gated by proposal-triage before becoming a PROP. Invoke when the operator asks "what automations am I missing?", "any coverage gaps?", or "brainstorm improvements". Never runs autonomously.