plugins/codex-sdlc/skills/doc-consolidate/SKILL.md
Crawl repository for scattered docs and consolidate into categorized reference index in .aiwg/docs/
npx skillsauth add jmagly/aiwg doc-consolidateInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are the Doc Consolidate Orchestrator — crawling a repository for documentation scattered across directories and building a consolidated reference index in .aiwg/docs/ for planning, ops, semantic memory, and release-associated documentation.
Repos accumulate docs everywhere: README files in every package, deployment guides in ops/, release notes at root, API docs in docs/api/, troubleshooting buried in wiki-style folders. This skill answers: "what docs exist, where are they, and what are they for?"
No file duplication. Copying docs creates parallel drift. Instead, build a reference manifest and lightweight stubs that point to originals.
Users may say:
Preview all discovered docs and their classifications without writing any files. Produces the same report as a live run but with no mutations.
<path> (optional)Limit discovery to a subtree. Useful for large monorepos.
/doc-consolidate --scope docs/
/doc-consolidate --scope packages/auth/
Only process files changed since the last successful run. Reads timestamp from .aiwg/reports/doc-consolidate-last-run.json. Falls back to full scan if no prior run exists.
<dir> (optional)Target a different project directory instead of the current working directory.
| Category | What belongs here | Path heuristics |
|----------|-------------------|-----------------|
| release | Changelogs, release notes, version announcements | CHANGELOG*, docs/releases/*, *release-note* |
| user | User guides, tutorials, quickstarts, how-tos | README*, docs/getting-started*, docs/quickstart*, docs/guide* |
| api | API references, SDK docs, integration guides | docs/api/*, *-reference.md, *-api.md, *cli-reference* |
| deployment | Deploy guides, runbooks, infra docs | docs/deploy*, ops/*, *runbook*, *infrastructure* |
| planning | Roadmaps, RFCs, proposals, ADRs | *roadmap*, *rfc*, *proposal*, ADR-*, docs/planning/* |
| communications | Announcements, blog posts, marketing | *announcement*, *blog*, *press*, *marketing* |
| help | Troubleshooting, FAQ, support docs | *troubleshoot*, FAQ*, *support*, *error-reference* |
| development | Contributing guides, dev setup, coding standards | CONTRIBUTING*, docs/development/*, docs/contributing*, *coding-standard* |
Each doc gets a primary category and optional tags for secondary categorization.
| File | Purpose |
|------|---------|
| .aiwg/docs/_manifest.yaml | Master index: path, category, title, summary, confidence, tags |
| .aiwg/docs/{category}/ | Stub files referencing originals via @-mentions |
| .aiwg/reports/doc-consolidate-{timestamp}.md | Run report with stats and low-confidence flags |
| .aiwg/reports/doc-consolidate-last-run.json | Incremental state (timestamp, file list, checksums) |
--dry-run, --scope, --incremental, --prefix--incremental: load .aiwg/reports/doc-consolidate-last-run.jsonInclude patterns:
**/*.md
**/*.txt (only in docs/, doc/, documentation/ directories)
**/*.rst
**/*.adoc
**/README*
**/CHANGELOG*
**/CONTRIBUTING*
**/LICENSE*
Exclude patterns:
node_modules/**
.git/**
vendor/**
dist/**
build/**
.aiwg/docs/** (output directory — avoid self-reference)
**/*.min.*
**/package-lock.json
For each file, extract:
path — relative to project roottitle — first H1 heading, or filename if no headingsummary — first non-empty paragraph (max 200 chars)size — file size in byteslastModified — from git log or file mtimeIf --incremental: filter to files with mtime newer than last run, or use git diff --name-only --since={lastRun} for precision
Report discovery results:
Discovery complete: {N} doc-like files found
Scope: {scope or "full repo"}
New since last run: {M} (if incremental)
Classify each discovered file into a category using a two-pass strategy:
Pass 1: Path heuristics (fast, deterministic)
Apply pattern matching on the file path. Rules evaluated in order, first match wins:
release:
- path matches: CHANGELOG*, */CHANGELOG*
- path matches: docs/releases/*, */releases/*
- path matches: *release-note*, *release_note*
- filename matches: RELEASES*, HISTORY*
user:
- path matches: README*, */README*
- path matches: docs/getting-started*, docs/quickstart*
- path matches: docs/guide*, docs/tutorial*
- path matches: docs/usage*, docs/install*
api:
- path matches: docs/api/*, */api-docs/*
- path matches: *-reference.md, *-api.md
- path matches: *cli-reference*, *sdk-*
- path matches: docs/integrations/*
deployment:
- path matches: docs/deploy*, */deploy/*
- path matches: ops/*, */ops/*
- path matches: *runbook*, *infrastructure*
- path matches: *docker*, *kubernetes*, *k8s*
- path matches: docs/install/non-interactive*
planning:
- path matches: *roadmap*, docs/roadmap*
- path matches: *rfc*, docs/rfc/*
- path matches: *proposal*, docs/proposals/*
- path matches: ADR-*, */ADR-*, *adr-*
- path matches: docs/planning/*
communications:
- path matches: *announcement*, */announcements/*
- path matches: docs/releases/*-announcement*
- path matches: *blog*, *press*, *marketing*
- path matches: *newsletter*
help:
- path matches: *troubleshoot*, */troubleshooting/*
- path matches: FAQ*, */FAQ*
- path matches: *support*, docs/support/*
- path matches: *error-reference*, *known-issues*
development:
- path matches: CONTRIBUTING*, */CONTRIBUTING*
- path matches: docs/development/*, docs/contributing/*
- path matches: *coding-standard*, *style-guide*
- path matches: docs/architecture/*, *dev-guide*
Confidence: high for path match.
Pass 2: Content analysis (for unmatched files)
For files that didn't match any path heuristic, read the first 500 characters and classify by keyword presence:
| Keywords | Category |
|----------|----------|
| version, release, changelog, breaking change, migration | release |
| getting started, tutorial, how to, step by step, quickstart | user |
| endpoint, API, request, response, parameter, authentication | api |
| deploy, server, infrastructure, docker, kubernetes, production | deployment |
| roadmap, milestone, planned, RFC, proposal, decision | planning |
| announcement, launch, update, community | communications |
| troubleshoot, FAQ, error, fix, solution, workaround | help |
| contributing, development, build, test, lint, setup | development |
Confidence: medium for single-keyword match, low for no clear match (defaults to user).
Output: Each file gets { path, category, confidence, title, summary, tags }
Report classification results:
Classification complete:
release: {N} user: {N} api: {N}
deployment: {N} planning: {N} communications: {N}
help: {N} development: {N}
Low confidence: {N} (review recommended)
If --dry-run: skip writing, proceed to report only.
Write .aiwg/docs/_manifest.yaml:
version: 1
generated: "2026-04-06T04:00:00Z"
project: /path/to/repo
total: 47
categories:
release:
- path: CHANGELOG.md
title: Changelog
summary: "All notable changes to this project..."
confidence: high
tags: [release, history]
lastModified: "2026-04-05"
size: 24500
- path: docs/releases/v2026.4.0-announcement.md
title: "v2026.4.0 Release"
summary: "AIWG v2026.4.0 brings the web dashboard..."
confidence: high
tags: [release, communications]
lastModified: "2026-04-04"
size: 8200
user:
- path: README.md
title: AIWG
summary: "Framework for improving AI-generated content..."
confidence: high
tags: [user, overview]
lastModified: "2026-04-05"
size: 12000
# ... remaining categories
stats:
total: 47
by_category:
release: 5
user: 12
api: 8
deployment: 3
planning: 4
communications: 5
help: 3
development: 7
by_confidence:
high: 38
medium: 6
low: 3
If --dry-run: skip stub creation, write report only.
Stub generation:
For each category with entries, ensure .aiwg/docs/{category}/ exists. For each doc in the category, write a stub file:
<!-- Auto-generated by doc-consolidate. Do not edit — edit the source file instead. -->
# {title}
> {summary}
**Source**: @{path}
**Category**: {category}
**Confidence**: {confidence}
**Last modified**: {lastModified}
**Tags**: {tags joined by comma}
Stub filename: sanitized version of the source path (e.g., docs/cli-reference.md becomes docs--cli-reference.md).
Run report: Write .aiwg/reports/doc-consolidate-{timestamp}.md:
# Doc Consolidate Report — {timestamp}
## Summary
- **Total docs discovered**: {N}
- **Categories**: {breakdown}
- **Low confidence**: {N} (listed below for review)
- **Mode**: {dry-run | live}
- **Scope**: {scope or full repo}
## By Category
### release ({N})
| File | Title | Confidence |
|------|-------|------------|
| CHANGELOG.md | Changelog | high |
| ... | ... | ... |
### user ({N})
...
## Low Confidence (Review Recommended)
| File | Assigned Category | Why |
|------|-------------------|-----|
| docs/misc/notes.md | user | No heuristic match, keyword: "how to" |
## Coverage Gaps
- No docs found in: {categories with 0 entries}
Incremental state: Write .aiwg/reports/doc-consolidate-last-run.json:
{
"timestamp": "2026-04-06T04:00:00Z",
"totalFiles": 47,
"byCategory": { "release": 5, "user": 12, ... },
"files": {
"CHANGELOG.md": { "checksum": "sha256:abc...", "category": "release" },
"README.md": { "checksum": "sha256:def...", "category": "user" }
}
}
| System | How |
|--------|-----|
| doc-sync | After consolidation, doc-sync can read _manifest.yaml to locate all docs for drift detection instead of re-scanning |
| build-artifact-index | Stubs in .aiwg/docs/ appear in aiwg index query results |
| ralph-memory | Write a memory entry with consolidation stats so future agent loops know doc coverage without re-scanning |
| fortemi | Manifest entries can be ingested as structured notes for semantic search across session history |
| intake-from-codebase | Can consume prior consolidation results from _manifest.yaml instead of re-analyzing docs |
/doc-consolidate
/doc-consolidate --dry-run
/doc-consolidate --scope docs/
/doc-consolidate --incremental
/doc-consolidate --prefix /path/to/other-project
User: "consolidate all the docs, focus on release and deployment docs"
→ Orchestrator runs full consolidation, highlights release and deployment categories in report
.aiwg/docs/ and .aiwg/reports/.aiwg/docs/ and re-run--dry-run before first live run on a new repocompletion:
required:
- manifest_written: true # .aiwg/docs/_manifest.yaml exists and is valid YAML
- stubs_generated: true # .aiwg/docs/{category}/ directories populated
- report_written: true # .aiwg/reports/doc-consolidate-{ts}.md
- incremental_state_saved: true # .aiwg/reports/doc-consolidate-last-run.json
optional:
- low_confidence_zero: false # Not required — low confidence files are flagged, not blocked
- all_categories_populated: false # Some repos won't have all 8 categories
data-ai
Report which research-corpus radar sidecars are overdue for refresh. Computes staleness (days since last refresh vs the cadence window) for every radar, sorted most-overdue-first. Runs via `aiwg corpus radar-status`.
data-ai
Aggregate research-corpus radar sidecars into a corpus or per-cluster freshness report — totals, overdue count, per-cluster / per-GRADE / per-trajectory breakdowns, an overdue table, and per-radar rationale snippets. Runs via `aiwg corpus radar-report`.
testing
Scaffold radar/freshness sidecars for research-corpus REFs. Pulls title/authors from the citation sidecar and GRADE from the analysis doc, defaults the refresh cadence from GRADE and the cluster from a corpus-local map, and stamps documentation/radar/REF-XXX-radar.md. Runs via `aiwg corpus radar-init`.
data-ai
Compute an entity's publication trajectory — per-year paper counts, topic drift, hot-streak detection (≥3 consecutive A-grade years), and career phase. Runs via `aiwg corpus profile-temporal`.