plugins/flow-next/skills/flow-next-audit/SKILL.md
Audit `.flow/memory/` entries against the current codebase and decide Keep / Update / Consolidate / Replace / Delete per entry. Triggers on /flow-next:audit, "audit memory", "review memory", "refresh learnings", "sweep stale memory", "consolidate overlapping memory entries". Optional `mode:autofix` token in arguments runs without questions and marks ambiguous as stale. Optional scope hint after the mode token (concept, category, module, or path) narrows what gets audited.
npx skillsauth add gmickel/gmickel-claude-marketplace flow-next-auditInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Memory entries decay. A .flow/memory/bug/runtime-errors/ entry logged six months ago might reference a renamed file, a deleted function, or a codepath that no longer exists. Without periodic review, the store accumulates zombie entries and memory-scout surfaces outdated advice.
This skill IS the audit. The host agent (Claude Code / Codex / Droid) walks .flow/memory/, reads each entry, uses Read/Grep/Glob/git to verify references against the current codebase, applies engineering judgment, and decides per entry whether to Keep / Update / Consolidate / Replace / Delete. Optional autofix mode applies unambiguous actions and marks ambiguous as stale.
Decision entries (.flow/memory/knowledge/decisions/) and glossary terms (GLOSSARY.md files at the repo root and on the ancestor chain) are walked alongside the rest of memory. Decisions get a calibrated judging question — "does the constraint that motivated this choice still hold?" — and Replace becomes a two-step supersession (write successor, mark old decision_status: superseded, never git rm). Glossary terms are scanned for code usage; zero-hit terms get a <!-- stale: ... --> HTML comment via Edit tool (no flowctl glossary mark-stale exists), _Avoid_ aliases appearing in code surface as alias-creep findings.
There is no Python audit-engine, no codex/copilot subprocess dispatch, no deterministic scorer. The host agent is already an LLM and does the work directly. flowctl provides only thin persistence plumbing (memory mark-stale, memory mark-fresh, memory search --status) — landed by Task 2 of this spec.
Read workflow.md for the full phase-by-phase execution. Read phases.md for the 5-outcomes lookup with memory-schema-specific calibration.
CRITICAL: flowctl is BUNDLED — NOT installed globally. which flowctl will fail (expected). Define once; subsequent blocks (here and in workflow.md) use $FLOWCTL:
FLOWCTL="${DROID_PLUGIN_ROOT:-${CLAUDE_PLUGIN_ROOT}}/scripts/flowctl"
[ -x "$FLOWCTL" ] || FLOWCTL=".flow/bin/flowctl"
Inline skill (no context: fork) — AskUserQuestion must stay reachable across phases. Subagents can't call blocking question tools (Claude Code issues #12890, #34592). Phase 3 (Ask) and Phase 6 (Discoverability check) both require user choice in interactive mode. (sync-codex.sh rewrites this to a plain-text numbered prompt in the Codex mirror.)
Parse $ARGUMENTS for the literal token mode:autofix. If present, strip it from the arguments — the remainder is the scope hint.
RAW_ARGS="$ARGUMENTS"
MODE="interactive"
if [[ "$RAW_ARGS" == *"mode:autofix"* ]]; then
MODE="autofix"
# Strip token, collapse whitespace, trim.
SCOPE_HINT=$(printf "%s" "$RAW_ARGS" | sed 's/mode:autofix//' | tr -s ' ' | sed 's/^ //;s/ $//')
else
SCOPE_HINT="$RAW_ARGS"
fi
| Mode | When | Behavior |
|------|------|----------|
| Interactive (default) | User is at the terminal | Ask decisions on ambiguous cases via blocking-question tool; confirm batched actions; run discoverability check with consent |
| Autofix (mode:autofix in arguments) | Ralph or batch usage | No user questions. Apply Keep/Update/Consolidate/auto-Delete/Replace-with-sufficient-evidence directly. Mark ambiguous as stale. Print the full report. Discoverability surfaces as a recommendation, not an edit |
git rm subsumed), auto-Delete (only when code AND problem domain both gone), Replace (only with sufficient evidence to write a trustworthy successor).flowctl memory mark-stale <id> --reason "..." instead of guessing. Stale-marking writes are atomic and round-trip safe.In autofix mode, skip user questions entirely and apply the rules above.
In interactive mode, follow these principles:
AskUserQuestion (call ToolSearch with select:AskUserQuestion first if its schema isn't loaded). Fall back to numbered options in plain text only if the tool is unreachable or errors. Never silently skip the question.The goal is automated maintenance with human oversight on judgment calls — not a question for every finding.
.flow/memory/pitfalls.md, conventions.md, decisions.md at the memory root). Skip with a warning that recommends /flow-next:memory-migrate first. Report includes the skipped count._audit/, _review/, or any other _* directory under .flow/memory/.git rm on superseded decision entries. Decision history stays on disk. Replace for knowledge/decisions/ entries means write a new entry and mark the old decision_status: superseded with superseded_by: <new-id> — never delete the old file.memory mark-stale, memory mark-fresh, memory search --status). fn-38 task 2 ships only glossary {add,list,read,remove} — there is no flowctl glossary mark-stale; use Edit tool. Use Write tool + git for moves and deletes.context: fork — blocking-question tools must stay reachable.Same pattern as /flow-next:plan and /flow-next:prospect — non-blocking notice when .flow/meta.json setup_version lags the plugin version:
if [[ -f .flow/meta.json ]]; then
SETUP_VER=$(jq -r '.setup_version // empty' .flow/meta.json 2>/dev/null)
PLUGIN_JSON="${DROID_PLUGIN_ROOT:-${CLAUDE_PLUGIN_ROOT}}/.claude-plugin/plugin.json"
PLUGIN_VER=$(jq -r '.version' "$PLUGIN_JSON" 2>/dev/null || echo "unknown")
if [[ -n "$SETUP_VER" && "$PLUGIN_VER" != "unknown" && "$SETUP_VER" != "$PLUGIN_VER" ]]; then
echo "Plugin updated to v${PLUGIN_VER}. Run /flow-next:setup to refresh local scripts (current: v${SETUP_VER})." >&2
fi
fi
Execute the phases in workflow.md in order:
.flow/memory/{bug,knowledge}/<category>/, group by module / category, count, choose interaction path (focused / batch / broad), skip legacy + _* directories with a counted warning. knowledge/decisions/ entries are picked up automatically by the same glob.
0.5 Glossary scan — enumerate GLOSSARY.md files via flowctl glossary list --json; per term, grep tracked code for the term and each _Avoid_ alias (case-insensitive whole-word, normalized whitespace); zero hits + zero alias hits → mark stale via Edit tool (HTML comment after the term heading); alias hits → surface as alias-creep finding for Phase 3 (interactive) or report (autofix); skip husk files (count: 0) with a single advisory.decision_status: superseded, superseded_by: <new-id>; never git rm the old).git rm subsumed. Replace: write new entry, git rm old (decisions: write new + edit old's frontmatter to mark superseded, never git rm). Delete: git rm (only when code AND problem domain both gone). Glossary stale: Edit comment after term heading. Ambiguous in autofix: flowctl memory mark-stale.@-including the other) mentions .flow/memory/ with schema basics (track / category / module / tags / status) and when to consult. Add a minimal line if missing — interactive asks consent, autofix surfaces as recommendation.The full report is the deliverable — print it as markdown to stdout. Do not summarize internally and emit a one-liner.
Report structure (see workflow.md §5 for full schema):
Memory Audit Summary
====================
Scanned: N entries
Skipped legacy: M (run `/flow-next:memory-migrate` first to make these auditable)
Kept: X
Updated: Y
Consolidated: C
Replaced: Z
Deleted: W
Marked stale: S
Glossary
--------
Files scanned: F (H husks)
Terms scanned: T
Kept: K_g
Marked stale: S_g
Alias-creep flagged: A_g
Then per-entry detail (id, classification, evidence, action taken). For Consolidate: which entry was canonical, what unique content was merged, what was deleted. For Replace: what the old entry recommended vs what current code does, path to successor (decision Replace also notes the old entry now carries decision_status: superseded). For Marked stale: why ambiguous. For glossary terms: only stale + alias-creep cases get per-term lines (Keep is silent); husks get a one-line advisory each.
Autofix mode splits actions into Applied (writes succeeded) and Recommended (writes failed — e.g. permission denied). The structure is the same; only the bucket differs.
testing
Live-app real-user QA pass derived from the spec. Drives the running app via flow-next-drive, derives scenarios from the spec's AC / R-IDs / boundaries, files structured P0/P1/P2 findings with evidence, and ends with a YES/NO ship verdict receipt. Triggers on /flow-next:qa with a spec id. FORBIDDEN from marking PASS by reading source — the verdict rests on captured evidence from the live app, never on agent narration.
testing
Live-app real-user QA pass derived from the spec. Drives the running app via flow-next-drive, derives scenarios from the spec's AC / R-IDs / boundaries, files structured P0/P1/P2 findings with evidence, and ends with a YES/NO ship verdict receipt. Triggers on /flow-next:qa with a spec id. FORBIDDEN from marking PASS by reading source — the verdict rests on captured evidence from the live app, never on agent narration.
testing
Project a flow-next spec to a tracker issue (Linear first, GitHub next) and reconcile body/status/comments two-way — projection, not coordination. The spec stays the source of truth; the tracker is a co-editable mirror. Use to configure the bridge (discovery ceremony), link a spec to an issue (flow-first push or tracker-first "grab issue X and spec it"), push/pull/reconcile, or unlink. Triggers on /flow-next:tracker-sync, "sync to linear", "push this spec to the tracker", "grab issue X and spec it", "link this spec to the issue", "reconcile with the tracker". NOT /flow-next:sync (that is plan-sync, a different skill).
development
Drive any UI surface like a real user - a web app, a Chromium-backed desktop app (Electron / WebView2, reached over CDP), or a genuinely native app (macOS AppKit/SwiftUI, or a non-CDP webview) reached via Computer Use. Detects the surface, picks the best available driver, degrades gracefully. Use to navigate sites, verify deployed UI, test web or desktop apps, capture baseline screenshots, drive a sign-in flow, scrape data, fill forms, run an e2e check, or inspect current page state. Triggers on "check the page", "verify UI", "test the site", "test this app", "drive the app", "automate this desktop app", "read docs at", "look up API", "visit URL", "browse", "screenshot", "scrape", "e2e test", "login flow", "capture baseline", "see how it looks", "inspect current", "before redesign", "Electron app", "native app".