plugins/flow-next/codex/skills/flow-next-audit/SKILL.md
Audit `.flow/memory/` entries against the current codebase and decide Keep / Update / Consolidate / Replace / Delete per entry. Triggers on /flow-next:audit, "audit memory", "review memory", "refresh learnings", "sweep stale memory", "consolidate overlapping memory entries". Optional `mode:autofix` token in arguments runs without questions and marks ambiguous as stale. Optional scope hint after the mode token (concept, category, module, or path) narrows what gets audited.
npx skillsauth add gmickel/gmickel-claude-marketplace flow-next-auditInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Memory entries decay. A .flow/memory/bug/runtime-errors/ entry logged six months ago might reference a renamed file, a deleted function, or a codepath that no longer exists. Without periodic review, the store accumulates zombie entries and memory-scout surfaces outdated advice.
This skill IS the audit. The host agent (Claude Code / Codex / Droid) walks .flow/memory/, reads each entry, uses Read/Grep/Glob/git to verify references against the current codebase, applies engineering judgment, and decides per entry whether to Keep / Update / Consolidate / Replace / Delete. Optional autofix mode applies unambiguous actions and marks ambiguous as stale.
Decision entries (.flow/memory/knowledge/decisions/) and glossary terms (GLOSSARY.md files at the repo root and on the ancestor chain) are walked alongside the rest of memory. Decisions get a calibrated judging question — "does the constraint that motivated this choice still hold?" — and Replace becomes a two-step supersession (write successor, mark old decision_status: superseded, never git rm). Glossary terms are scanned for code usage; zero-hit terms get a <!-- stale: ... --> HTML comment via Edit tool (no flowctl glossary mark-stale exists), _Avoid_ aliases appearing in code surface as alias-creep findings.
There is no Python audit-engine, no codex/copilot subprocess dispatch, no deterministic scorer. The host agent is already an LLM and does the work directly. flowctl provides only thin persistence plumbing (memory mark-stale, memory mark-fresh, memory search --status) — landed by Task 2 of this spec.
Read workflow.md for the full phase-by-phase execution. Read phases.md for the 5-outcomes lookup with memory-schema-specific calibration.
CRITICAL: flowctl is BUNDLED — NOT installed globally. which flowctl will fail (expected). Define once; subsequent blocks (here and in workflow.md) use $FLOWCTL:
FLOWCTL="$HOME/.codex/scripts/flowctl"
[ -x "$FLOWCTL" ] || FLOWCTL=".flow/bin/flowctl"
Ask the user via plain text. Render the options below as a numbered list 1. … N., followed by a final option N+1. Other — type your own answer. Print the question, then the numbered list, then stop and wait for the user's next message before continuing. Parse the reply as: a bare number 1–N+1 → that option; the literal text of an option label → that option; free text after Other → custom answer.
Inline skill (no context: fork) — plain-text numbered prompt must stay reachable across phases. Subagents can't call plain-text numbered prompts (Claude Code issues #12890, #34592). Phase 3 (Ask) and Phase 6 (Discoverability check) both require user choice in interactive mode.
Parse $ARGUMENTS for the literal token mode:autofix. If present, strip it from the arguments — the remainder is the scope hint.
RAW_ARGS="$ARGUMENTS"
MODE="interactive"
if [[ "$RAW_ARGS" == *"mode:autofix"* ]]; then
MODE="autofix"
# Strip token, collapse whitespace, trim.
SCOPE_HINT=$(printf "%s" "$RAW_ARGS" | sed 's/mode:autofix//' | tr -s ' ' | sed 's/^ //;s/ $//')
else
SCOPE_HINT="$RAW_ARGS"
fi
| Mode | When | Behavior |
|------|------|----------|
| Interactive (default) | User is at the terminal | Ask decisions on ambiguous cases via plain-text numbered prompt; confirm batched actions; run discoverability check with consent |
| Autofix (mode:autofix in arguments) | Ralph or batch usage | No user questions. Apply Keep/Update/Consolidate/auto-Delete/Replace-with-sufficient-evidence directly. Mark ambiguous as stale. Print the full report. Discoverability surfaces as a recommendation, not an edit |
git rm subsumed), auto-Delete (only when code AND problem domain both gone), Replace (only with sufficient evidence to write a trustworthy successor).flowctl memory mark-stale <id> --reason "..." instead of guessing. Stale-marking writes are atomic and round-trip safe.In autofix mode, skip user questions entirely and apply the rules above.
In interactive mode, follow these principles:
plain-text numbered prompt. Never silently skip the question.The goal is automated maintenance with human oversight on judgment calls — not a question for every finding.
.flow/memory/pitfalls.md, conventions.md, decisions.md at the memory root). Skip with a warning that recommends /flow-next:memory-migrate first. Report includes the skipped count._audit/, _review/, or any other _* directory under .flow/memory/.git rm on superseded decision entries. Decision history stays on disk. Replace for knowledge/decisions/ entries means write a new entry and mark the old decision_status: superseded with superseded_by: <new-id> — never delete the old file.memory mark-stale, memory mark-fresh, memory search --status). fn-38 task 2 ships only glossary {add,list,read,remove} — there is no flowctl glossary mark-stale; use Edit tool. Use Write tool + git for moves and deletes.context: fork — plain-text numbered prompt must stay reachable.Same pattern as /flow-next:plan and /flow-next:prospect — non-blocking notice when .flow/meta.json setup_version lags the plugin version:
if [[ -f .flow/meta.json ]]; then
SETUP_VER=$(jq -r '.setup_version // empty' .flow/meta.json 2>/dev/null)
PLUGIN_JSON="${DROID_PLUGIN_ROOT:-${CLAUDE_PLUGIN_ROOT:-$HOME/.codex}}/.codex-plugin/plugin.json"
PLUGIN_VER=$(jq -r '.version' "$PLUGIN_JSON" 2>/dev/null || echo "unknown")
if [[ -n "$SETUP_VER" && "$PLUGIN_VER" != "unknown" && "$SETUP_VER" != "$PLUGIN_VER" ]]; then
echo "Plugin updated to v${PLUGIN_VER}. Run /flow-next:setup to refresh local scripts (current: v${SETUP_VER})." >&2
fi
fi
Execute the phases in workflow.md in order:
.flow/memory/{bug,knowledge}/<category>/, group by module / category, count, choose interaction path (focused / batch / broad), skip legacy + _* directories with a counted warning. knowledge/decisions/ entries are picked up automatically by the same glob.
0.5 Glossary scan — enumerate GLOSSARY.md files via flowctl glossary list --json; per term, grep tracked code for the term and each _Avoid_ alias (case-insensitive whole-word, normalized whitespace); zero hits + zero alias hits → mark stale via Edit tool (HTML comment after the term heading); alias hits → surface as alias-creep finding for Phase 3 (interactive) or report (autofix); skip husk files (count: 0) with a single advisory.decision_status: superseded, superseded_by: <new-id>; never git rm the old).git rm subsumed. Replace: write new entry, git rm old (decisions: write new + edit old's frontmatter to mark superseded, never git rm). Delete: git rm (only when code AND problem domain both gone). Glossary stale: Edit comment after term heading. Ambiguous in autofix: flowctl memory mark-stale.@-including the other) mentions .flow/memory/ with schema basics (track / category / module / tags / status) and when to consult. Add a minimal line if missing — interactive asks consent, autofix surfaces as recommendation.The full report is the deliverable — print it as markdown to stdout. Do not summarize internally and emit a one-liner.
Report structure (see workflow.md §5 for full schema):
Memory Audit Summary
====================
Scanned: N entries
Skipped legacy: M (run `/flow-next:memory-migrate` first to make these auditable)
Kept: X
Updated: Y
Consolidated: C
Replaced: Z
Deleted: W
Marked stale: S
Glossary
--------
Files scanned: F (H husks)
Terms scanned: T
Kept: K_g
Marked stale: S_g
Alias-creep flagged: A_g
Then per-entry detail (id, classification, evidence, action taken). For Consolidate: which entry was canonical, what unique content was merged, what was deleted. For Replace: what the old entry recommended vs what current code does, path to successor (decision Replace also notes the old entry now carries decision_status: superseded). For Marked stale: why ambiguous. For glossary terms: only stale + alias-creep cases get per-term lines (Keep is silent); husks get a one-line advisory each.
Autofix mode splits actions into Applied (writes succeeded) and Recommended (writes failed — e.g. permission denied). The structure is the same; only the bucket differs.
development
Drive any UI surface like a real user - a web app, a Chromium-backed desktop app (Electron / WebView2, reached over CDP), or a genuinely native app (macOS AppKit/SwiftUI, or a non-CDP webview) reached via Computer Use. Detects the surface, picks the best available driver, degrades gracefully. Use to navigate sites, verify deployed UI, test web or desktop apps, capture baseline screenshots, drive a sign-in flow, scrape data, fill forms, run an e2e check, or inspect current page state. Triggers on "check the page", "verify UI", "test the site", "test this app", "drive the app", "automate this desktop app", "read docs at", "look up API", "visit URL", "browse", "screenshot", "scrape", "e2e test", "login flow", "capture baseline", "see how it looks", "inspect current", "before redesign", "Electron app", "native app".
development
Drive any UI surface like a real user - a web app, a Chromium-backed desktop app (Electron / WebView2, reached over CDP), or a genuinely native app (macOS AppKit/SwiftUI, or a non-CDP webview) reached via Computer Use. Detects the surface, picks the best available driver, degrades gracefully. Use to navigate sites, verify deployed UI, test web or desktop apps, capture baseline screenshots, drive a sign-in flow, scrape data, fill forms, run an e2e check, or inspect current page state. Triggers on "check the page", "verify UI", "test the site", "test this app", "drive the app", "automate this desktop app", "read docs at", "look up API", "visit URL", "browse", "screenshot", "scrape", "e2e test", "login flow", "capture baseline", "see how it looks", "inspect current", "before redesign", "Electron app", "native app".
data-ai
Render a cognitive-aid PR body from flow-next state and open via gh. Triggers on /flow-next:make-pr with optional spec id and flags (--draft, --ready, --no-mermaid, --base <ref>, --memory, --dry-run). Auto-detects spec from current branch when no id given. NOT Ralph-blocked — autonomous loops can surface a draft PR for human review.
data-ai
Render a cognitive-aid PR body from flow-next state and open via gh. Triggers on /flow-next:make-pr with optional spec id and flags (--draft, --ready, --no-mermaid, --base <ref>, --memory, --dry-run). Auto-detects spec from current branch when no id given. NOT Ralph-blocked — autonomous loops can surface a draft PR for human review.