/flow-next:audit — agent-native memory staleness review

Memory entries decay. A .flow/memory/bug/runtime-errors/ entry logged six months ago might reference a renamed file, a deleted function, or a codepath that no longer exists. Without periodic review, the store accumulates zombie entries and memory-scout surfaces outdated advice.

This skill IS the audit. The host agent (Claude Code / Codex / Droid) walks .flow/memory/, reads each entry, uses Read/Grep/Glob/git to verify references against the current codebase, applies engineering judgment, and decides per entry whether to Keep / Update / Consolidate / Replace / Delete. Optional autofix mode applies unambiguous actions and marks ambiguous as stale.

Decision entries (.flow/memory/knowledge/decisions/) and glossary terms (GLOSSARY.md files at the repo root and on the ancestor chain) are walked alongside the rest of memory. Decisions get a calibrated judging question — "does the constraint that motivated this choice still hold?" — and Replace becomes a two-step supersession (write successor, mark old decision_status: superseded, never git rm). Glossary terms are scanned for code usage; zero-hit terms get a  HTML comment via Edit tool (no flowctl glossary mark-stale exists), _Avoid_ aliases appearing in code surface as alias-creep findings.

There is no Python audit-engine, no codex/copilot subprocess dispatch, no deterministic scorer. The host agent is already an LLM and does the work directly. flowctl provides only thin persistence plumbing (memory mark-stale, memory mark-fresh, memory search --status) — landed by Task 2 of this spec.

Read workflow.md for the full phase-by-phase execution. Read phases.md for the 5-outcomes lookup with memory-schema-specific calibration.

Preamble

CRITICAL: flowctl is BUNDLED — NOT installed globally. which flowctl will fail (expected). Define once; subsequent blocks (here and in workflow.md) use $FLOWCTL:

FLOWCTL="${DROID_PLUGIN_ROOT:-${CLAUDE_PLUGIN_ROOT}}/scripts/flowctl"
[ -x "$FLOWCTL" ] || FLOWCTL=".flow/bin/flowctl"

Inline skill (no context: fork) — AskUserQuestion must stay reachable across phases. Subagents can't call blocking question tools (Claude Code issues #12890, #34592). Phase 3 (Ask) and Phase 6 (Discoverability check) both require user choice in interactive mode. (sync-codex.sh rewrites this to a plain-text numbered prompt in the Codex mirror.)

Mode Detection

Parse $ARGUMENTS for the literal token mode:autofix. If present, strip it from the arguments — the remainder is the scope hint.

RAW_ARGS="$ARGUMENTS"
MODE="interactive"
if [[ "$RAW_ARGS" == *"mode:autofix"* ]]; then
  MODE="autofix"
  # Strip token, collapse whitespace, trim.
  SCOPE_HINT=$(printf "%s" "$RAW_ARGS" | sed 's/mode:autofix//' | tr -s ' ' | sed 's/^ //;s/ $//')
else
  SCOPE_HINT="$RAW_ARGS"
fi

| Mode | When | Behavior | |------|------|----------| | Interactive (default) | User is at the terminal | Ask decisions on ambiguous cases via blocking-question tool; confirm batched actions; run discoverability check with consent | | Autofix (mode:autofix in arguments) | Ralph or batch usage | No user questions. Apply Keep/Update/Consolidate/auto-Delete/Replace-with-sufficient-evidence directly. Mark ambiguous as stale. Print the full report. Discoverability surfaces as a recommendation, not an edit |

Autofix mode rules

No user questions. Never call the blocking-question tool.
Process all entries in scope. No scope-narrowing question. If no scope hint was provided, process every categorized entry.
Attempt all safe actions. Keep (no-op), Update (write tool), Consolidate (merge + git rm subsumed), auto-Delete (only when code AND problem domain both gone), Replace (only with sufficient evidence to write a trustworthy successor).
Mark ambiguous as stale. When classification is genuinely ambiguous (Update vs Replace vs Consolidate vs Delete) or Replace evidence is insufficient, run flowctl memory mark-stale <id> --reason "..." instead of guessing. Stale-marking writes are atomic and round-trip safe.
Conservative confidence. Borderline cases get marked stale; never deleted on autofix.
Always print the full report. The report is the sole deliverable — there is no user to ask follow-ups.

Interaction Principles (interactive mode only)

In autofix mode, skip user questions entirely and apply the rules above.

In interactive mode, follow these principles:

Ask one question at a time via AskUserQuestion (call ToolSearch with select:AskUserQuestion first if its schema isn't loaded). Fall back to numbered options in plain text only if the tool is unreachable or errors. Never silently skip the question.
Prefer multiple choice when natural options exist.
Lead with the recommended option and a one-sentence rationale.
Do not ask the user to make decisions before evidence is gathered — Phase 1 investigates first, Phase 3 asks.
Group obvious Keeps and obvious Updates together for batched confirmation. Present Consolidate / Replace / Delete one at a time.

The goal is automated maintenance with human oversight on judgment calls — not a question for every finding.

Forbidden

Auditing legacy flat files (.flow/memory/pitfalls.md, conventions.md, decisions.md at the memory root). Skip with a warning that recommends /flow-next:memory-migrate first. Report includes the skipped count.
Auditing under _audit/, _review/, or any other _* directory under .flow/memory/.
Deleting silently. Delete is reserved for unambiguous cases (code gone AND problem domain gone). Default to Replace or Consolidate when there's still value to preserve.
git rm on superseded decision entries. Decision history stays on disk. Replace for knowledge/decisions/ entries means write a new entry and mark the old decision_status: superseded with superseded_by: <new-id> — never delete the old file.
Deleting glossary terms. When a term has zero code hits, mark stale via Edit-tool HTML comment. Removing the term entry is the operator's call, surfaced in the report.
Inventing flowctl subcommands beyond what fn-34 task 2 ships (memory mark-stale, memory mark-fresh, memory search --status). fn-38 task 2 ships only glossary {add,list,read,remove} — there is no flowctl glossary mark-stale; use Edit tool. Use Write tool + git for moves and deletes.
Mass-renaming code from a glossary alias-creep finding. The audit reports file:line locations and stops there; code rename is the operator's call.
Auto-committing without user awareness in interactive mode. Phase 5 detects git context and asks. Autofix uses sensible defaults.
Setting context: fork — blocking-question tools must stay reachable.
Running parallel replacement subagents. Investigation subagents can run in parallel for 3+ independent entries; replacement subagents run sequentially to protect orchestrator context.

Pre-check: local setup version

Same pattern as /flow-next:plan and /flow-next:prospect — non-blocking notice when .flow/meta.json setup_version lags the plugin version:

if [[ -f .flow/meta.json ]]; then
  SETUP_VER=$(jq -r '.setup_version // empty' .flow/meta.json 2>/dev/null)
  PLUGIN_JSON="${DROID_PLUGIN_ROOT:-${CLAUDE_PLUGIN_ROOT}}/.claude-plugin/plugin.json"
  PLUGIN_VER=$(jq -r '.version' "$PLUGIN_JSON" 2>/dev/null || echo "unknown")
  if [[ -n "$SETUP_VER" && "$PLUGIN_VER" != "unknown" && "$SETUP_VER" != "$PLUGIN_VER" ]]; then
    echo "Plugin updated to v${PLUGIN_VER}. Run /flow-next:setup to refresh local scripts (current: v${SETUP_VER})." >&2
  fi
fi

Workflow

Execute the phases in workflow.md in order:

Discover & Triage — walk .flow/memory/{bug,knowledge}/<category>/, group by module / category, count, choose interaction path (focused / batch / broad), skip legacy + _* directories with a counted warning. knowledge/decisions/ entries are picked up automatically by the same glob. 0.5 Glossary scan — enumerate GLOSSARY.md files via flowctl glossary list --json; per term, grep tracked code for the term and each _Avoid_ alias (case-insensitive whole-word, normalized whitespace); zero hits + zero alias hits → mark stale via Edit tool (HTML comment after the term heading); alias hits → surface as alias-creep finding for Phase 3 (interactive) or report (autofix); skip husk files (count: 0) with a single advisory.
Investigate — per entry: read frontmatter + body, verify referenced files / symbols / modules against current code via Read / Grep / Glob, check git log in the area, form Keep / Update / Consolidate / Replace / Delete recommendation with 2-4 evidence bullets and confidence. For 3+ independent entries, dispatch parallel investigation subagents (read-only). Decision entries use the calibrated judging question — "does the constraint still hold?" — see phases.md §Decision-entry calibration. 1.75 Cross-doc analysis — compare entries sharing module / category for overlap (problem, solution, root cause, files), supersession (newer canonical entry covers older narrower precursor), contradictions.
Classify — apply phases.md decision criteria. For Replace, verify evidence is sufficient to write a trustworthy successor; mark stale otherwise. For decision entries, Replace = supersede (write new entry; mark old decision_status: superseded, superseded_by: <new-id>; never git rm the old).
Ask — interactive only; autofix skips. Group obvious Keeps + Updates → confirm batch. Present Consolidate / Replace / non-auto-Delete individually. Surface glossary alias-creep findings per alias. Lead with recommendation. One question at a time.
Execute — Keep: no edit. Update: agent edits frontmatter / body via Write tool, preserving unknown fields. Consolidate: merge unique content into canonical, git rm subsumed. Replace: write new entry, git rm old (decisions: write new + edit old's frontmatter to mark superseded, never git rm). Delete: git rm (only when code AND problem domain both gone). Glossary stale: Edit comment after term heading. Ambiguous in autofix: flowctl memory mark-stale.
Report + Commit — print Kept / Updated / Consolidated / Replaced / Deleted / Marked-stale / Skipped counts plus per-entry detail and a Glossary section (Kept / Marked stale / Alias-creep / Husks). Detect git context (current branch, dirty tree). Interactive: ask commit options. Autofix: branch-and-PR on main, commit on feature branch, stage only audit-modified files.
Discoverability check — verify the substantive CLAUDE.md / AGENTS.md (the one not just @-including the other) mentions .flow/memory/ with schema basics (track / category / module / tags / status) and when to consult. Add a minimal line if missing — interactive asks consent, autofix surfaces as recommendation.

Output rules

The full report is the deliverable — print it as markdown to stdout. Do not summarize internally and emit a one-liner.

Report structure (see workflow.md §5 for full schema):

Memory Audit Summary
====================
Scanned: N entries
Skipped legacy: M (run `/flow-next:memory-migrate` first to make these auditable)

Kept: X
Updated: Y
Consolidated: C
Replaced: Z
Deleted: W
Marked stale: S

Glossary
--------
Files scanned: F (H husks)
Terms scanned: T
Kept: K_g
Marked stale: S_g
Alias-creep flagged: A_g

Then per-entry detail (id, classification, evidence, action taken). For Consolidate: which entry was canonical, what unique content was merged, what was deleted. For Replace: what the old entry recommended vs what current code does, path to successor (decision Replace also notes the old entry now carries decision_status: superseded). For Marked stale: why ambiguous. For glossary terms: only stale + alias-creep cases get per-term lines (Keep is silent); husks get a one-line advisory each.

Autofix mode splits actions into Applied (writes succeeded) and Recommended (writes failed — e.g. permission denied). The structure is the same; only the bucket differs.

/flow-next:audit — agent-native memory staleness review

Read workflow.md for the full phase-by-phase execution. Read phases.md for the 5-outcomes lookup with memory-schema-specific calibration.

Preamble

CRITICAL: flowctl is BUNDLED — NOT installed globally. which flowctl will fail (expected). Define once; subsequent blocks (here and in workflow.md) use $FLOWCTL:

FLOWCTL="${DROID_PLUGIN_ROOT:-${CLAUDE_PLUGIN_ROOT}}/scripts/flowctl"
[ -x "$FLOWCTL" ] || FLOWCTL=".flow/bin/flowctl"

Mode Detection

Parse $ARGUMENTS for the literal token mode:autofix. If present, strip it from the arguments — the remainder is the scope hint.

RAW_ARGS="$ARGUMENTS"
MODE="interactive"
if [[ "$RAW_ARGS" == *"mode:autofix"* ]]; then
  MODE="autofix"
  # Strip token, collapse whitespace, trim.
  SCOPE_HINT=$(printf "%s" "$RAW_ARGS" | sed 's/mode:autofix//' | tr -s ' ' | sed 's/^ //;s/ $//')
else
  SCOPE_HINT="$RAW_ARGS"
fi

Autofix mode rules

No user questions. Never call the blocking-question tool.
Process all entries in scope. No scope-narrowing question. If no scope hint was provided, process every categorized entry.
Attempt all safe actions. Keep (no-op), Update (write tool), Consolidate (merge + git rm subsumed), auto-Delete (only when code AND problem domain both gone), Replace (only with sufficient evidence to write a trustworthy successor).
Mark ambiguous as stale. When classification is genuinely ambiguous (Update vs Replace vs Consolidate vs Delete) or Replace evidence is insufficient, run flowctl memory mark-stale <id> --reason "..." instead of guessing. Stale-marking writes are atomic and round-trip safe.
Conservative confidence. Borderline cases get marked stale; never deleted on autofix.
Always print the full report. The report is the sole deliverable — there is no user to ask follow-ups.

Interaction Principles (interactive mode only)

In autofix mode, skip user questions entirely and apply the rules above.

In interactive mode, follow these principles:

Ask one question at a time via AskUserQuestion (call ToolSearch with select:AskUserQuestion first if its schema isn't loaded). Fall back to numbered options in plain text only if the tool is unreachable or errors. Never silently skip the question.
Prefer multiple choice when natural options exist.
Lead with the recommended option and a one-sentence rationale.
Do not ask the user to make decisions before evidence is gathered — Phase 1 investigates first, Phase 3 asks.
Group obvious Keeps and obvious Updates together for batched confirmation. Present Consolidate / Replace / Delete one at a time.

The goal is automated maintenance with human oversight on judgment calls — not a question for every finding.

Forbidden

Auditing legacy flat files (.flow/memory/pitfalls.md, conventions.md, decisions.md at the memory root). Skip with a warning that recommends /flow-next:memory-migrate first. Report includes the skipped count.
Auditing under _audit/, _review/, or any other _* directory under .flow/memory/.
Deleting silently. Delete is reserved for unambiguous cases (code gone AND problem domain gone). Default to Replace or Consolidate when there's still value to preserve.
git rm on superseded decision entries. Decision history stays on disk. Replace for knowledge/decisions/ entries means write a new entry and mark the old decision_status: superseded with superseded_by: <new-id> — never delete the old file.
Deleting glossary terms. When a term has zero code hits, mark stale via Edit-tool HTML comment. Removing the term entry is the operator's call, surfaced in the report.
Inventing flowctl subcommands beyond what fn-34 task 2 ships (memory mark-stale, memory mark-fresh, memory search --status). fn-38 task 2 ships only glossary {add,list,read,remove} — there is no flowctl glossary mark-stale; use Edit tool. Use Write tool + git for moves and deletes.
Mass-renaming code from a glossary alias-creep finding. The audit reports file:line locations and stops there; code rename is the operator's call.
Auto-committing without user awareness in interactive mode. Phase 5 detects git context and asks. Autofix uses sensible defaults.
Setting context: fork — blocking-question tools must stay reachable.
Running parallel replacement subagents. Investigation subagents can run in parallel for 3+ independent entries; replacement subagents run sequentially to protect orchestrator context.

Pre-check: local setup version

Same pattern as /flow-next:plan and /flow-next:prospect — non-blocking notice when .flow/meta.json setup_version lags the plugin version:

if [[ -f .flow/meta.json ]]; then
  SETUP_VER=$(jq -r '.setup_version // empty' .flow/meta.json 2>/dev/null)
  PLUGIN_JSON="${DROID_PLUGIN_ROOT:-${CLAUDE_PLUGIN_ROOT}}/.claude-plugin/plugin.json"
  PLUGIN_VER=$(jq -r '.version' "$PLUGIN_JSON" 2>/dev/null || echo "unknown")
  if [[ -n "$SETUP_VER" && "$PLUGIN_VER" != "unknown" && "$SETUP_VER" != "$PLUGIN_VER" ]]; then
    echo "Plugin updated to v${PLUGIN_VER}. Run /flow-next:setup to refresh local scripts (current: v${SETUP_VER})." >&2
  fi
fi

Workflow

Execute the phases in workflow.md in order:

Discover & Triage — walk .flow/memory/{bug,knowledge}/<category>/, group by module / category, count, choose interaction path (focused / batch / broad), skip legacy + _* directories with a counted warning. knowledge/decisions/ entries are picked up automatically by the same glob. 0.5 Glossary scan — enumerate GLOSSARY.md files via flowctl glossary list --json; per term, grep tracked code for the term and each _Avoid_ alias (case-insensitive whole-word, normalized whitespace); zero hits + zero alias hits → mark stale via Edit tool (HTML comment after the term heading); alias hits → surface as alias-creep finding for Phase 3 (interactive) or report (autofix); skip husk files (count: 0) with a single advisory.
Investigate — per entry: read frontmatter + body, verify referenced files / symbols / modules against current code via Read / Grep / Glob, check git log in the area, form Keep / Update / Consolidate / Replace / Delete recommendation with 2-4 evidence bullets and confidence. For 3+ independent entries, dispatch parallel investigation subagents (read-only). Decision entries use the calibrated judging question — "does the constraint still hold?" — see phases.md §Decision-entry calibration. 1.75 Cross-doc analysis — compare entries sharing module / category for overlap (problem, solution, root cause, files), supersession (newer canonical entry covers older narrower precursor), contradictions.
Classify — apply phases.md decision criteria. For Replace, verify evidence is sufficient to write a trustworthy successor; mark stale otherwise. For decision entries, Replace = supersede (write new entry; mark old decision_status: superseded, superseded_by: <new-id>; never git rm the old).
Ask — interactive only; autofix skips. Group obvious Keeps + Updates → confirm batch. Present Consolidate / Replace / non-auto-Delete individually. Surface glossary alias-creep findings per alias. Lead with recommendation. One question at a time.
Execute — Keep: no edit. Update: agent edits frontmatter / body via Write tool, preserving unknown fields. Consolidate: merge unique content into canonical, git rm subsumed. Replace: write new entry, git rm old (decisions: write new + edit old's frontmatter to mark superseded, never git rm). Delete: git rm (only when code AND problem domain both gone). Glossary stale: Edit comment after term heading. Ambiguous in autofix: flowctl memory mark-stale.
Report + Commit — print Kept / Updated / Consolidated / Replaced / Deleted / Marked-stale / Skipped counts plus per-entry detail and a Glossary section (Kept / Marked stale / Alias-creep / Husks). Detect git context (current branch, dirty tree). Interactive: ask commit options. Autofix: branch-and-PR on main, commit on feature branch, stage only audit-modified files.
Discoverability check — verify the substantive CLAUDE.md / AGENTS.md (the one not just @-including the other) mentions .flow/memory/ with schema basics (track / category / module / tags / status) and when to consult. Add a minimal line if missing — interactive asks consent, autofix surfaces as recommendation.

Output rules

The full report is the deliverable — print it as markdown to stdout. Do not summarize internally and emit a one-liner.

Report structure (see workflow.md §5 for full schema):

Memory Audit Summary
====================
Scanned: N entries
Skipped legacy: M (run `/flow-next:memory-migrate` first to make these auditable)

Kept: X
Updated: Y
Consolidated: C
Replaced: Z
Deleted: W
Marked stale: S

Glossary
--------
Files scanned: F (H husks)
Terms scanned: T
Kept: K_g
Marked stale: S_g
Alias-creep flagged: A_g

Autofix mode splits actions into Applied (writes succeeded) and Recommended (writes failed — e.g. permission denied). The structure is the same; only the bucket differs.

Adoption

gmickel/flow-next-audit

$ install --global

Security Scan Results

SKILL.md

/flow-next:audit — agent-native memory staleness review

Preamble

Mode Detection

Autofix mode rules

Interaction Principles (interactive mode only)

Forbidden

Pre-check: local setup version

Workflow

Output rules

Related Skills

gmickel/flow-next-qa

gmickel/flow-next-qa

gmickel/flow-next-tracker-sync

gmickel/flow-next-drive

gmickel/flow-next-audit

$ install --global

Security Scan Results

SKILL.md

/flow-next:audit — agent-native memory staleness review

Preamble

Mode Detection

Autofix mode rules

Interaction Principles (interactive mode only)

Forbidden

Pre-check: local setup version

Workflow

Output rules

Related Skills

gmickel/flow-next-qa

gmickel/flow-next-qa

gmickel/flow-next-tracker-sync

gmickel/flow-next-drive