plugins/codebase-documentor-for-aws/skills/document-service/SKILL.md
This skill should be used when the user asks to "analyze this codebase", "document this service", "generate technical docs", "I inherited this code", "help me understand this system", "create docs for this project", "what does this system look like", "onboard me to this codebase", "this codebase has no docs", "visualize the architecture from code", or any explicit request to produce structured documentation or architecture diagrams from an existing codebase. Specifically optimized for AWS workloads (CDK, CloudFormation, Terraform) with source-of-truth citations. Do NOT activate for code reviews, single-function explanations, generating new code, or general coding tasks.
npx skillsauth add awslabs/agent-plugins document-serviceInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Analyze codebases to produce structured technical documentation and architecture diagrams with source-of-truth citations. Every finding links back to the exact file and line it was derived from. Optimized for AWS workloads but works with any codebase.
[RATIONALE UNKNOWN].file:line citations for every finding. See citation-format.md. Verify citations precisely — re-read the cited file and confirm the line number is within ±3 lines. Anchor with function/variable names.[UNKNOWN] for items not inferable from code, [RISK] for unhandled failure modes, [INFERRED] for educated guesses, [RATIONALE UNKNOWN] for unexplained architecture choices. Omitting markers undermines trust.The workflow runs autonomously from Step 2 onward. Step 1 is the only interactive step.
Gather from the user:
If existing docs are provided, read them first to establish baseline context. If the target directory and context are already known (e.g., provided via automation or a pre-configured prompt), skip the interactive step and proceed directly to Step 2.
Check whether CODEBASE_ANALYSIS.md already exists at the output path. If so, ask the user: "Overwrite or write to a different filename?" Resolve this before proceeding — the rest of the workflow runs autonomously.
.gitignore.git branch -a) for strategic context (e.g., a dev/rust branch signals a language migration in progress). Note active branches in the Architecture Overview.Produce a hierarchical outline mapping each documentation section to specific source files:
## Documentation Outline
1. Architecture Overview → [entry points, IaC stack files] — explain WHY, not just WHAT
2. [Module A: detected name] → [source files for module A]
3. [Module B: detected name] → [source files for module B]
4. Shared Utilities → [shared/common source files]
5. Request Lifecycle → [trace end-to-end flows through the system]
6. Domain Logic Deep-Dive → [core services at implementation level: algorithms, parameters, edge cases]
7. Startup and Initialization → [boot sequence, model loading, cache warmup, dependency checks]
8. API Contracts → [route definitions, OpenAPI specs]
9. Data Models → [schema files, ORM models]
10. Deployment → [IaC files, Dockerfiles]
11. Configuration → [config files, .env.example, prompt templates, YAML configs, secrets refs]
12. Monitoring and Observability → [log groups, metrics, tracing, alarms, dashboards]
13. Security → [auth, encryption, IAM, network isolation]
14. Local Development → [how to run/test locally, CPU fallback, dev environment setup]
15. Discrepancies → (cross-reference README/metadata vs actual code)
16. Failure Modes → (cross-cutting — include detection + recovery)
17. Timeout and Dependency Chain → (map cascading timeouts across layers)
Follow the section structure in technical-doc-template.md but adapt to the actual codebase — add sections for significant modules, skip sections that don't apply. Aim for balance: each section should map to a meaningful subset of files. If a module maps to more than ~30 files, consider splitting it into sub-sections.
Do NOT pause for user review. Proceed immediately to analysis.
Two core analysis paths:
For each outline section, read mapped source files and extract:
Consult framework-patterns.md for framework-specific extraction patterns.
When IaC files are detected (CDK, CloudFormation, Terraform, Serverless Framework):
awsiac to confirm resource interpretations, awsknowledge for service descriptions.When no IaC is found, infer infrastructure from application code (SDK clients, connection strings, environment variables) and mark components as [INFERRED].
Note on CDK projects: In CDK codebases, the IaC IS application code (TypeScript/Python constructs). Process CDK files in a single pass covering both Path A and Path B rather than treating them as separate analyses. Extract both the resource definitions (Path B) and the application logic interleaved with them (Lambda bundling, environment wiring, IAM grants — Path A) simultaneously.
For each outline section:
[UNUSED] potential dead code.Process cross-cutting sections (Failure Modes, Configuration, Security, Discrepancies) last, drawing on accumulated knowledge.
Discrepancy detection: After analyzing the codebase, re-read the README, CLAUDE.md, package.json description, and any project metadata. Flag every claim that does not match the actual code — features referenced but not implemented, resource types that differ, architecture components that don't exist. For legacy codebases, this "trust but verify" pass is the single most valuable output.
Actionable failure modes: For each failure mode, include the detection method (CloudWatch metric, log pattern, symptom) and recovery steps (actual commands), not just a description. The reader is an on-call engineer at 3am.
Do not attempt a single-pass skim. For each module or service, use iterative deepening:
For codebases with multiple top-level modules, deep nesting, or hundreds of source files:
.codebase-documentor-progress.md task board to track progress through sections, enabling resumability if interrupted. This works on all platforms (Claude Code, Cursor, Codex, or any coding assistant).See recursive-analysis.md for detailed instructions on both approaches.
Two types of diagrams serve different purposes:
Sequence/flow diagrams — inline Mermaid. For request lifecycle traces and data pipeline flows identified in Step 4, generate Mermaid sequenceDiagram or flowchart blocks inline in the relevant CODEBASE_ANALYSIS.md sections. Mermaid is the community standard for simple flow diagrams and renders natively on GitHub. Keep these focused — one diagram per major request path or data flow.
Architecture diagram — always attempt the aws-architecture-diagram skill first. For the system-level architecture diagram (services, infrastructure, boundaries): invoke the aws-architecture-diagram skill (part of the deploy-on-aws plugin) with "analyze [target-directory]" to trigger Mode A. It produces a validated draw.io diagram (docs/*.drawio) with official AWS4 icons and professional styling. Only if the skill is genuinely unavailable (not installed, invocation fails), fall back to a Mermaid flowchart TD architecture overview directly in the Architecture Overview section. Include all major services, data stores, external dependencies, and infrastructure boundaries (VPC/subnets as subgraphs when IaC is present).
After diagram generation, try to export to PNG for embedding in the report. Run drawio -x -f png -b 10 -o docs/<name>.drawio.png docs/<name>.drawio. If drawio is not on PATH, skip the PNG export — the report will link to the .drawio file directly instead of embedding an image.
Cross-reference the diagram against the Architecture Overview text. Update documentation or diagram if they diverge.
Assemble all sections into CODEBASE_ANALYSIS.md following technical-doc-template.md
Embed the architecture diagram as an image with a link to the editable source:

> Editable source: [`docs/<name>.drawio`](./docs/<name>.drawio)
If PNG export was not possible, link to the .drawio file directly. Mermaid flow diagrams go inline in relevant sections.
When the codebase reveals clear business capabilities (API contracts, domain models, data flows, SLA configs), include a Business Context section at the end of CODEBASE_ANALYSIS.md following business-context.md. Skip only for pure libraries or infrastructure-only code. Do NOT include speculative content — but a README describing the product IS sufficient business context.
Tag items not inferable from code with [UNKNOWN]
Write CODEBASE_ANALYSIS.md to the target directory
Remove .codebase-documentor-progress.md if it was created during analysis
Present summary: components documented, APIs found, unknowns tagged, citations included
| File | Purpose |
| ---------------------- | ------------------------------------------------------------------------------ |
| CODEBASE_ANALYSIS.md | Single output — technical docs, business context, citations, and flow diagrams |
| docs/*.drawio | Architecture diagram source (editable in draw.io) |
| docs/*.drawio.png | Architecture diagram image (embedded in report, if CLI export available) |
| Setting | Default | Override | | -------------------- | ------------------------------------------------------------------------ | --------------- | | Primary output | CODEBASE_ANALYSIS.md | - | | Flow diagrams | Mermaid inline (sequenceDiagram / flowchart) | "skip diagrams" | | Architecture diagram | draw.io via aws-architecture-diagram skill (Mermaid fallback if missing) | "skip diagrams" | | IaC reading | Read-only (never modify) | - | | AWS enrichment | Enabled when AWS services detected | "skip AWS" | | Scope | User-specified directory | - |
See error-scenarios.md for handling of empty directories, missing entry points, missing IaC, existing output files, and MCP server failures.
Consult when AWS services are detected. Use for enrichment (adding official service descriptions and documentation links to CODEBASE_ANALYSIS.md) and validation (confirming the analysis interpretation is correct). When the codebase is self-explanatory, validation is more valuable than enrichment — do not add MCP content just because the server is available.
Example queries: search for "Amazon ECS on EC2 GPU instances" to confirm GPU support patterns, or read the official service page for an unfamiliar AWS service to get a one-line description.
Consult when CDK or CloudFormation files are detected. Use primarily for validation — confirm that the interpretation of a construct or resource type matches its actual behavior. Particularly useful for complex constructs with non-obvious defaults.
Example queries: confirm properties of ecs.FargateService vs ecs.Ec2Service or verify CloudFormation resource relationships. Terraform files are still analyzed by the skill itself (see discovery-patterns.md IaC Detection), just without this MCP server's schema validation.
development
Build workflows with AWS Step Functions state machines using the JSONata query language. Covers Amazon States Language (ASL) structure, state types, variables, data transformation, error handling, AWS service integration, and migrating from the JSONPath to the JSONata query language.
tools
Design, build, deploy, test, and debug serverless applications with AWS Lambda. Triggers on phrases like: Lambda function, event source, serverless application, API Gateway, EventBridge, Step Functions, serverless API, event-driven architecture, Lambda trigger. For deploying non-serverless apps to AWS, use deploy-on-aws plugin instead.
development
Validates the user's environment for SageMaker AI operations — checks SDK version, AWS region, and execution role. Use when the user says "set up", "getting started", "check my environment", "configure SDK", or as the first step in any plan involving SageMaker/Bedrock training, evaluation, or deployment.
data-ai
Selects a base model for the user's use case by querying SageMaker Hub. Use when the user asks which model to use, wants to select or change their base model, mentions a model name or family (e.g., "Llama", "Mistral", "Nova"), or wants to evaluate a base model — always activate even for known model names because the exact Hub model ID must be resolved. Queries available models, presents benchmarks and licenses, and confirms selection.