- name:
- flowai-maintenance
- description:
- Use when the user asks for a project health audit, maintenance sweep, or multi-category lead-engineer scan followed by interactive issue-by-issue resolution with user approval. Do NOT trigger on routine lint/test runs, single-file cleanups, or standard "check project" requests.
Task: Project Maintenance & Health Audit
Overview
Execute a rigorous multi-category maintenance sweep, then walk the user through each finding interactively. The process has two distinct phases:
- Scan Phase: Run all checks silently, collecting findings into an internal list. No fixes during this phase.
- Resolution Phase: Present the summary, then iterate through each finding — show details, propose a fix, ask the user for a decision, apply if approved.
Context
<context>
This command is the "Garbage Collector" and "Building Inspector" for the project. It ensures the codebase remains maintainable, documented, and aligned with architectural standards.
Categories checked:
- Structure: Files in wrong places.
- Consistency: Docs vs. Code truth.
- Hygiene: Dead code, unused imports, weak tests.
- Complexity: "God objects" and massive functions.
- Debt: Accumulated TODOs.
- Language: Inconsistent terminology.
- Doc Coverage: Missing explanations in code.
- Instruction Coherence: Contradictions and ambiguities across project instructions.
- Tooling Relevance: Skills, agents, rules, and hooks that don't match the project.
- Documentation Health: Broken GFM cross-links, stale
[x] FRs, orphan FRs, SRS↔SDS contradictions, documents/index.md drift.
- Architectural Integrity: Dependency direction, import cycles, layer leakage, reverse dependencies.
- Conceptual Duplication: Parallel implementations of one decision; untyped path beside a typed sibling; diverging schema clones.
- API Contract Review: Capability-vs-implementation mismatch; sentinel-vs-missing conflation; defaults-toward-bug; dead enum values; type-level vs runtime invariant divergence.
- Cross-Implementation Symmetry: Capability / error-class / reserved-set / warning-latch parity across N implementations of one interface.
- Defensive-Programming Smell: Silent swallows of consumer-callback errors; wholesale failure swallowing; fallback-on-zero; error-as-decision coupling.
- Invariant ↔ Test Pairing: Documented architectural invariants without matching tests; stub-only contract tests where the real-binary path is gated/absent.
- Public-Surface Quality: Synonym duplication; free-function-and-method duplicates; barrel re-exports of internal-only symbols; reserved lists mixing positionals and flags.
</context>
Rules & Constraints
<rules>
1. **Precision**: Use specific thresholds (e.g., File > 500 lines).
2. **Constructive**: Every issue must have a proposed fix.
3. **Holistic**: Scan `documents/`, `.cursor/`, and source code directories.
4. **Mandatory**: Use a task management tool (e.g., `todo_write`, `todowrite`) to track progress through the phases.
5. **Language Agnostic**: Adapt checks (imports, syntax, test patterns) to the primary language of the project (TS, JS, Py, Go, etc.).
6. **No premature fixes**: Do NOT apply any changes during the Scan Phase. Only collect findings.
7. **User decides**: Every fix requires explicit user approval. Never apply fixes silently.
</rules>
Question Format (FR-UNIVERSAL.QA-FORMAT)
For the per-finding Apply / Skip / Edit verdict in Step 13:
- The question MUST be a numbered list item (
1., 2., …) — not a heading, bold-only line, or paragraph.
- The post-summary "how to proceed" prompt in Step 12 is exempt: it follows a long rich-content findings list and falls under the "rich-content alternatives" exemption in
FR-UNIVERSAL.QA-FORMAT.
Instructions
<step_by_step>
SCAN PHASE
Collect all findings into an internal list. Each finding has: category, file/symbol, problem description, proposed fix, severity (critical/warning).
-
Initialize & Plan
- Use a task management tool (e.g.,
todo_write, todowrite) to create a plan covering all scan categories below.
- Identify project's primary language and source directories.
-
Category 1: Structural Integrity
- File placement: Check that all source files reside in expected directories per project conventions (e.g.,
src/, lib/, scripts/). Flag files at wrong levels.
- Dead directories: Identify empty or orphaned directories with no purpose.
- Naming conventions: Verify file and directory names follow project conventions (case, separators).
- Config files: Ensure project config files (
deno.json, package.json, etc.) are at expected locations.
-
Category 2: Code Hygiene & Dependencies
- Dead Code: Identify exported/public symbols in source directories that
are never imported/called elsewhere.
- Unused Imports: Scan source files for imports/includes that are not
used in the file body.
- Test Quality: Read test files (e.g.,
*.test.*, *_test.*,
test_*.py). Flag tests that:
- Have no assertions.
- Use trivial assertions (e.g.,
expect(true).toBe(true), assert True).
- Are commented out.
- Invariant ↔ test pairing (lightweight): Extract SHOULD/MUST clauses from
AGENTS.md, CLAUDE.md, documents/design.md. Grep test descriptors (it, Deno.test, def test_, func Test) for matching coverage; flag invariants with zero matching test names. (Deeper analysis — stub-only tests, hand-curated lists without cross-reference tests — lives in Category 15.)
-
Category 3: Complexity & Hotspots
- Project-context normalization: Read project vision in
AGENTS.md / CLAUDE.md and pick the LOC bucket BEFORE flagging files. Buckets: thin wrapper / facade (declared "wrapper", "adapter", "bindings", "SDK") → 300 lines; service / framework / tool (default) → 500 lines; monolithic app / migration target → 800 lines. State the bucket and quote the source phrase in the finding.
- Files: Flag any source file exceeding the bucket's threshold.
- Functions: Scan for functions/methods exceeding 50 lines.
- God Objects: Classes/modules with mixed concerns (logic + UI + DB in one file).
- Mixed-concerns detection (beyond LOC): Flag a single file holding 2+ unrelated top-level concerns (e.g., parsing + lifecycle + formatting + protocol detection). Signals: cross-domain export verbs, heterogeneous imports, section comments naming distinct phases. Trips even when file is under the LOC threshold.
-
Category 4: Technical Debt Aggregation
- Scan: Search for
TODO, FIXME, HACK, XXX tags in the codebase.
- Group: Organize by file/module.
- Analysis: Flag any that look critical or like "temporary" fixes that
became permanent.
-
Category 5: Consistency (Docs vs. Code)
- Terminology: Extract key terms from
README.md and documents/. Check
if code uses different synonyms (e.g., "User" in docs vs "Customer" in
code).
- Drift: Pick 3 major claims from
documents/*.md (e.g., "The system
handles X asynchronously"). Verify if the code actually does that.
-
Category 6: Code Documentation Coverage
- Rule: Every file, class, method, and exported function MUST have
documentation (JSDoc, Docstring, Rustdoc, etc.).
- Check:
- Responsibility: Does the comment explain what it does?
- Nuances: For complex logic (cyclomatic complexity > 5 or > 20 lines),
are there examples or edge case warnings?
- Scan: primary source directories.
- Report: List undocumented symbols.
-
Category 7: Instruction Coherence
- Scope: Read all instruction files that guide agent/developer behavior:
CLAUDE.md (root and nested), AGENTS.md files, documents/requirements.md,
documents/design.md, and any rules/conventions files.
- Contradictions: Identify mutually exclusive rules across or within files
(e.g., "use tabs" in one section vs. "use 2 spaces" in another; "never mock"
vs. "mock freely").
- Ambiguities: Flag vague or open-ended instructions that could be
interpreted in conflicting ways by different agents or sessions.
- Redundancy: Note duplicated rules across files that may diverge over time.
- Scope conflicts: Check that nested instruction files (
subdir/CLAUDE.md)
don't silently override root-level rules without explicit justification.
- Coherence verdict: For each issue, state which files/sections conflict and
propose a resolution (keep one, merge, or clarify).
-
Category 8: Tooling Relevance
- Scope: Inventory all installed skills (
.claude/skills/, .cursor/skills/),
agents/subagents (.claude/agents/, .cursor/agents/), hooks (.claude/hooks/,
.cursor/hooks/, .husky/), and rules files.
- Stack match: Compare each item against the project's declared tooling stack
(from
AGENTS.md or CLAUDE.md) and actual source files. Flag items designed
for a different tech stack (e.g., Django skill in a TypeScript project, Python
linting hook in a Deno project).
- Domain match: Flag agents/skills targeting a domain absent from the project
(e.g., Kubernetes deployer agent in a project with no K8s manifests or Dockerfiles).
- Stale tooling: Identify skills/agents/hooks that reference tools, commands,
or frameworks not present in the project (e.g., hook calling
flake8 when no
Python files exist).
- Verdict: For each mismatch, state what the item expects vs. what the project
actually uses, and propose a fix (remove, replace with stack-appropriate
alternative, or add justification).
-
Category 9: Documentation Health (FR-DOC-LINT)
Audit the project's documentation system for broken or stale cross-references. This category is DISTINCT from Category 6 "Documentation Coverage" — coverage is about JSDoc / comments per code symbol; health is about cross-link integrity, FR-status freshness, and SRS↔SDS alignment.
In the Resolution Phase summary (step 10), findings from THIS category MUST appear under their OWN dedicated header. The header text MUST literally contain the English token Documentation Health (case-insensitive, may be followed by a translation in parentheses if the rest of the report is in another language — e.g. Documentation Health (Здоровье документации)). Do NOT translate the header outright; do NOT fold these findings into Documentation Coverage, Consistency, or any other existing category — the dedicated header is what makes the audit's doc-system focus visible to FR-DOC-LINT consumers.
- Broken GFM cross-links: scan project markdown (
documents/*.md, README.md, AGENTS.md) and source-code comments for links of the form [text](path.md#anchor). Flag any link where (a) the target file does not exist or (b) the anchor does not match a heading's GFM auto-slug in the target file.
- Stale
[x] FRs: read documents/requirements.md. For each ### FR-<ID> block whose **Status:** is [x], verify the **Acceptance:** reference resolves — test path / benchmark id / command exists. Flag mismatches.
- Orphan FRs: for each
[x] FR in SRS, search source code for any GFM-link reference of the form [FR-<ID>](.../requirements.md#…). Flag FRs with zero references in code.
- SRS ↔ SDS contradictions: skim
documents/requirements.md and documents/design.md for paired statements about the same component or behavior with mutually exclusive constraints (e.g., SRS says required, SDS says removed). Flag concrete pairs.
documents/index.md drift: if documents/index.md exists, compare its FR rows against documents/requirements.md — flag rows whose status, summary, or anchor disagree with the SRS, and SRS FRs missing a row.
- Verdict: each finding must reference the exact file (and line if applicable) and propose a concrete fix.
Categories 10–16 (architectural review) — full sub-check details, patterns, thresholds, and verdict shape live in references/architectural-categories.md. READ THAT FILE BEFORE running these checks. Each finding follows the same - [N] <site>: <problem>. (Fix: <fix>) shape as Cats 1–9.
- Category 10: Architectural Integrity — cyclic imports through barrels (TDZ trap); layer leakage; reverse dependencies. Read declared layering in
AGENTS.md / CLAUDE.md / SDS first.
- Category 11: Conceptual Duplication — parallel implementations of one decision; untyped path beside typed sibling; diverging schema clones.
- Category 12: API Contract Review — capability-vs-impl mismatch; sentinel-vs-missing conflation; default-toward-bug; dead enum values; type-vs-runtime invariant divergence.
- Category 13: Cross-Implementation Symmetry — capability / error-class / reserved-set / warning-latch parity across N implementations of one interface.
- Category 14: Defensive-Programming Smell — silent consumer-callback swallows; wholesale failure swallowing; fallback-on-zero (
|| vs ??); error-as-decision coupling.
- Category 15: Invariant ↔ Test Pairing — deeper than Cat 2's spot-check. Architectural invariants without tests; stub-only contract tests; hand-curated lists without cross-reference tests.
- Category 16: Public-Surface Quality — synonym duplication; free-fn-and-method duplicates; barrel re-exports of internal-only symbols; reserved-flag lists mixing positionals and flags; overlapping public/private boundary.
RESOLUTION PHASE
-
Present Summary
- Output the full findings list, grouped by category. Use plain-text category labels (not markdown
# headings). Skip any category with no findings.
- Category labels — use the 16 categories below. Even when the rest of the report is rendered in the user's language, the category label MUST be a clear DEDICATED tag for that category, not a translation that overlaps with another category's label. Acceptable: the English label verbatim, OR a translation that uniquely names the category. Forbidden: translating two distinct categories to the same word, OR omitting the new doc-system category entirely. If in doubt, fall back to the literal English label.
Structural Integrity
Code Hygiene
Complexity & Hotspots
Technical Debt
Consistency (Docs vs Code) — code/doc DRIFT (README claims X but code does Y).
Documentation Coverage — JSDoc/comments per code symbol.
Instruction Coherence
Tooling Relevance
Documentation Health (FR-DOC-LINT) — REQUIRED whenever step 10 produced any finding. Distinct from #5 and #6: this group covers DOC-TO-DOC integrity (broken GFM cross-links, stale [x] FRs whose acceptance reference is missing, orphan FRs with no source-code link, SRS↔SDS contradictions, documents/index.md drift). NEVER fold these findings into Consistency (Docs vs Code) or Documentation Coverage — they are different concerns and FR-DOC-LINT consumers look specifically for the dedicated Documentation Health group.
Architectural Integrity
Conceptual Duplication
API Contract Review
Cross-Implementation Symmetry
Defensive-Programming Smell
Invariant ↔ Test Pairing
Public-Surface Quality
- Each issue line follows the shape:
- [N] <file/symbol>: <problem>. (Fix: <proposed fix>)
- Number every finding sequentially across all categories (e.g., [1], [2], ..., [N]).
- At the end of the summary, show the total count per category and overall.
- Brief example (covers Categories 1, 2, 9):
Structural Integrity
- [1] src/oldfile.ts: in root, should be in src/utils/. (Fix: move)
Code Hygiene
- [2] utils.ts: unused export `myFunc`. (Fix: delete)
Documentation Health
- [3] FR-AUTH marked [x] in SRS, no source-code reference. (Fix: add `[FR-AUTH](...)` link or revert to [ ])
Total: 3 findings (Structural Integrity: 1, Code Hygiene: 1, Documentation Health: 1).
- One representative finding per category (including Cats 10–16: architectural integrity, conceptual duplication, API contract review, cross-implementation symmetry, defensive-programming smell, invariant↔test pairing, public-surface quality) lives in references/example-findings.md. Mirror that shape for new categories.
-
Ask User How to Proceed
- After the summary, ask the user which findings to resolve (this prompt is exempt from FR-UNIVERSAL.QA-FORMAT — see scope). Accept these reply modes:
- specific numbers (e.g.,
1, 3, 4) — resolve only the selected findings
- category name (e.g.,
Hygiene) — resolve all findings in that category
all — walk through every finding one by one
agent's choice — pick the most impactful subset yourself, emit a one-line justification, and proceed without re-asking
done — stop, no fixes needed
-
Interactive Resolution Loop
- For each finding the user chose to resolve (in order):
- Show the finding details: file, problem, and proposed fix.
- Ask the user (as a numbered question per FR-UNIVERSAL.QA-FORMAT):
- Apply — execute the proposed fix (edit file, move file, delete code, etc.).
- Skip — move to the next finding.
- Edit — user provides an alternative fix; apply that instead.
- After applying a fix, briefly confirm what was done.
- Move to the next finding.
- After all selected findings are processed, show a brief summary of actions taken (N applied, M skipped, K edited).
</step_by_step>
Verification
<verification>
[ ] Checked structural integrity (file placement, naming, configs).
[ ] Scanned for dead code and unused imports.
[ ] Checked file/function length limits (500/50 lines).
[ ] Aggregated all TODO/FIXME tags.
[ ] Verified documentation terminology vs code usage.
[ ] Checked for missing code documentation (File/Class/Method).
[ ] Checked instruction coherence across CLAUDE.md, AGENTS.md, and docs (contradictions, ambiguities, redundancy).
[ ] Checked tooling relevance (skills, agents, hooks vs. project stack and domain).
[ ] Checked Documentation Health (broken GFM links, stale [x] FRs, orphan FRs, SRS↔SDS contradictions, documents/index.md drift) — findings grouped under the dedicated `Documentation Health` header in the summary.
[ ] Checked Architectural Integrity (cycles, layer leakage, reverse deps) against declared layering.
[ ] Checked Conceptual Duplication (parallel decision tables, untyped/typed asymmetry, schema clones).
[ ] Checked API Contract Review (capability-vs-impl, sentinel-vs-missing, default-toward-bug, dead enums, type-vs-runtime divergence).
[ ] Checked Cross-Implementation Symmetry (capability / error-class / reserved-set / warning-latch parity).
[ ] Checked Defensive-Programming Smell (callback swallows, wholesale swallows, fallback-on-zero, error-as-decision).
[ ] Checked Invariant ↔ Test Pairing (SHOULD/MUST clauses → test descriptors; stub-only contract tests).
[ ] Checked Public-Surface Quality (synonyms, free-fn-and-method dup, barrel re-exports of internals, reserved lists mixing positionals).
[ ] Presented numbered summary of all findings, grouped by category.
[ ] Asked the user how to proceed with resolution.
[ ] Resolved selected findings interactively (apply/skip/edit per finding).
[ ] Showed final resolution summary (applied/skipped/edited counts).
</verification>