skills/groom/SKILL.md
Always-on backlog grooming. Tidy, brainstorm, interrogate, investigate, research, and simplify in a single loop. Tidy is not a mode — it happens every time. Strategic-layer work fans out parallel interrogation, design-critique, technical-review, and research lanes. Use when: "groom", "what should we build", "rethink this", "biggest opportunity", "backlog", "prioritize", "backlog session", "audit skills", "skill quality audit". Trigger: /groom, /groom audit, /backlog, /rethink, /moonshot, /scaffold.
npx skillsauth add phrazzld/spellbook groomInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Keep the backlog healthy, organized, tidy, and strategically aligned. One loop, always-on. You cannot groom a backlog without tidying it.
Grooming is the single operation that keeps backlog.d/ useful. Every
invocation runs the full loop:
/research for outside context and
reports which research surfaces succeeded, failed, or were unavailable.Emphasis flags (--emphasis explore|rethink|moonshot|scaffold) weight the
loop toward a direction. They do not turn steps off. There is no tidy
subcommand; tidy is the price of admission.
You are the executive orchestrator. Keep synthesis, prioritization, and decision authority on the lead model. Delegate investigation and evidence gathering to focused subagents in parallel.
Delegation floor applies: probe the roster first; dispatch two or more
providers for substantive work; direct solo only for mechanical, emergency,
user-forbidden, or fewer-than-two-providers cases. See
harnesses/shared/AGENTS.md (Roster).
Local lane guidance: Use lanes for backlog drift, premise challenge, technical hotspots, product opportunity, ideal-form design, security/privacy, agent-readiness, simplification/deletion, and external context; the lead keeps final prioritization.
/groom audit is a read-only skill quality coverage report, not a normal
grooming run and not a hard gate. Run:
python3 skills/groom/scripts/audit-skills.py
The report walks skills/*/SKILL.md and scores four dimensions:
name and descriptiontests/, evals/, test scripts, or a verification sectionharnesses/shared/AGENTS.mdOrder by severity and present the report as-is. Do not auto-fix skills, generate tests, or add a Dagger gate from audit findings.
A groom run is not an orientation report. It is complete only when the final artifact proves all of these:
/research fanout runs by default. The final
report names the state of Exa, xAI/Grok, Thinktank, and codebase research,
including failed, partial, or unavailable sources.If any item is missing, say the groom is incomplete and keep working.
Phase-gated. Each phase completes before the next begins.
project.md / CLAUDE.md / AGENTS.md for product lens.backlog.d/ — every active ticket, by ID..groom/retro/ if present — effort calibration, blocker patterns..groom/review-scores.ndjson if present — review-quality trend. If
scripts/review-score-trends.py exists, run it and include its output when
the file has 5+ entries; below 5 entries, report the insufficient-data count.exemplars.md if present — existing reference implementations.Do not block on missing artifacts. Note absence and proceed.
Gate: every shipped ticket archived, every stale in-progress flagged, duplicates called out.
Launch investigation bench, premise-challenge, CEO/user-value review,
ideal-form design, security/privacy review, agent-readiness review,
technical-review bench, simplification/deletion review, codebase hotspot scans,
and /research delegations in a single message so they run in parallel. A
groom run that ran one subagent has failed the fanout goal.
Gate: every dispatched subagent returned a structured report.
/shape's references/prd-ticket-quality.md: user, problem, why now, UX
enabled, deliverable type, technical design, ADR decision, alternatives,
executable/report oracle, evidence artifacts, and residual risk must be
explicit before implementation. Flag Status: ready tickets that still use
unresolved target language such as "preferably", "confirm later", or "pick
during implementation".Do NOT delegate synthesis — it requires product judgment.
Ask each perspective: "what on this backlog should we just delete?" Every top-3 candidate for deletion gets surfaced to the user with rationale.
Deletions are proposed, never executed silently. See Refuse Conditions.
One theme at a time. For each:
**Why:** justification tying back to a concrete perspective./shape's references/prd-ticket-quality.md; otherwise emit it as raw
backlog idea, not Status: ready.User decides per theme: write / edit / delete / skip. Silence is not consent.
The always-on tidy step. These steps are MANDATORY every run. Source the helper lib once at the start:
source "$(git rev-parse --show-toplevel)/scripts/lib/backlog.sh"
Find every Closes-backlog: / Ships-backlog: trailer that landed since
the last archive. The merge base depends on the repo's default branch
(main or master):
default="$(git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's@^refs/remotes/origin/@@')"
default="${default:-main}"
backlog_ids_from_range "origin/${default}..${default}"
backlog.d/For each ID returned above:
backlog_archive "$id"
backlog_archive is idempotent — re-archiving an already-archived ID
exits 0 silently. Stage the moves and commit:
chore(backlog): archive shipped tickets swept by /groom
Some tickets get marked Status: done / Status: shipped in frontmatter
without a trailer (legacy or hand-edited). Scan backlog.d/*.md and move
any such ticket to _done/ via backlog_archive using its numeric ID.
in-progressFor each ticket with Status: in-progress:
done if the closing trailer landed, ready
otherwise).If two tickets describe the same work, flag for consolidation. Do not merge silently — surface the pair with a proposed consolidated shape and let the user ratify.
The interesting work. Dispatched in parallel for fresh-context judgment. A groom run that ran fewer than seven of these perspectives has failed the fanout goal even if two roster providers returned.
For the top 3 unshaped items (fuzzy goal, missing oracle):
.groom/BACKLOG.md.For the top 1-2 shaped packets (clear goal + oracle + sequence):
Dispatch in parallel, single message, one ad-hoc critic per lens (read the
lens from harnesses/shared/references/lenses.md; no static agent file):
Commission each critic with the lens's "looks for" from the rubric, scoped to a
concrete slice of recent code (see Codebase investigation below). Ask each:
"what technical-debt ticket does the current backlog miss?" Surface every
emission as a proposed new ticket with the lens name in the **Why:**.
/research fanoutInvoke /research with a focused query for outside context even when the repo
looks familiar. Use Exa, xAI/Grok, Thinktank, and codebase research when those
surfaces are available. Pipe results into synthesis to pressure-test or enrich
proposed tickets. If a surface fails, times out, or is unavailable, keep the
failure in the final artifact instead of treating silence as no evidence.
Dispatch in parallel, single message:
git log --since=30.days.ago --name-only --pretty=format: \
| sort | uniq -c | sort -rn | head -20
Subagent prompt: "what simplification or consolidation opportunity is
the current backlog missing for these files?"grep -rn -E 'TODO|FIXME|HACK|XXX' --include='*.ts' --include='*.py' \
--include='*.sh' --include='*.md' . 2>/dev/null \
| awk -F: '{print $1}' | sort | uniq -c | sort -rn | head -10
Subagent prompt: "read the TODOs in these files; which translate to
tickets the backlog lacks?"Status: in-progress. Subagent
prompt: "read the ticket and the branch commits; what is it actually
stuck on? What ticket unblocks it?"Every subagent returns a structured report:
## [Subagent Name] Report
### Top 3 Findings
1. [finding] — Evidence: file:line / commit / metric. Impact: high/med/low.
2. ...
3. ...
### Strategic Theme
[One sentence tying findings together.]
### Single Recommendation
[One concrete ticket to add, edit, or delete. Not a list.]
Ask every perspective already dispatched: "what on this backlog should we just delete?" Collect candidates. Present top 3 to the user with:
File naming: backlog.d/<nnn>-<kebab-slug>.md (e.g. 029-adaptive-backoff.md).
IDs are bare numeric strings (029, not BACKLOG-029).
# <Title as imperative sentence>
Priority: P0 | P1 | P2 | P3
Status: pending | ready | blocked | in-progress | done | shipped | abandoned
Estimate: S | M | L | XL
## Goal
<1 sentence — the outcome, not the mechanism.>
## Non-Goals
- <what this ticket will NOT do>
## Oracle
- [ ] <mechanically verifiable criterion — prefer executable commands>
- [ ] <"how will we know this is done?" — rough oracles are still oracles>
## Notes
<constraints, prior art, open questions, linked tickets>
Every active ticket MUST have Goal + Oracle. A ticket without an oracle
is not ready — /groom either fixes it or demotes it to the icebox.
When grooming Harness Kit itself, prefer items that create reusable
primitives, scaffolds, references, or policies; validate proving-ground
patterns meant to transfer outward; or remove debt that blocks downstream
adoption. See references/backlog-doctrine.md under "Harness Kit Product
Lens."
Closure flows through git trailers, not prose markers. Canonical keys
(recognized by scripts/lib/backlog.sh):
Closes-backlog: <id> — closes the ticket (archival intent).Ships-backlog: <id> — synonym for Closes-backlog.Refs-backlog: <id> — references without closing./ship owns trailer injection on the squash merge commit. /groom's
tidy step consumes those trailers to archive. For back-compat only: the
older prose markers Closes backlog:<id> / Ships backlog:<id> are
tolerated when scanned from old commits, but NEVER emitted by current
tooling.
Full trailer reference lives in skills/ship/SKILL.md under "Trailer
Conventions" — do not duplicate it here.
/flywheel at the start of each cycle to pick the next
item. /flywheel reads /groom's emitted top-of-backlog and proceeds./research — external context for unfamiliar domains.ousterhout, carmack, grug, beck).backlog.sh).backlog.d/_done/, new tickets in
backlog.d/, edits to existing tickets, proposed deletions.The operator sees, in order:
Terse. No marketing voice. The backlog diff is the artifact; the prose exists to justify it.
Stop and surface to the user instead of proceeding:
/groom X request is a
first-draft articulation, not a locked problem. Five-whys before
theming.closes-backlog: 29 (wrong case /
wrong key / trailing whitespace) is invisible to backlog.sh. Always
emit via git interpret-trailers --trailer, never by hand.backlog.d/ is the source of truth.development
Lightweight evidence-backed retro and catch-up reports for a current repo, branch, PR, backlog slice, or recent agent session. Use when the user asks for a debrief, catch me up, what changed, why it matters, product implications, end-user implications, developer experience implications, current app state, backlog state, workspace state, alternatives considered, or context rebuild after losing the thread. Trigger: /debrief.
testing
Capture agent-session work records as local JSONL audit evidence. Links a backlog/spec, branch, commits, review verdicts, QA/demo evidence, transcript refs, and shipped ref without storing raw private transcripts. Use when: "trace this work", "write work record", "agent session trace", "journal this delivery", "link transcript evidence". Trigger: /trace, /journal.
data-ai
Turn proven agent-session patterns into first-party Harness Kit skills. Use when: "skillify this conversation", "make this into a skill", "generate a skill from current transcript", "extract reusable workflow". Trigger: /skillify.
testing
Run one targeted, read-only architecture or quality critique through a named lens from the shared rubric. Use when: "critique this module", "run an Ousterhout pass", "lens critique", "architecture critique". Trigger: /critique.