skills/code-analysis/component-boundary-identifier/SKILL.md
Identifies natural component boundaries inside a monolith by clustering the dependency graph, finding the cuts with minimum coupling. Use when planning to modularize or extract microservices, when deciding what can be deployed independently, or when the user asks where the seams in this codebase are.
npx skillsauth add santosomar/general-secure-coding-agent-skills component-boundary-identifierInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A good component boundary has high cohesion inside, low coupling across. Finding boundaries is a graph clustering problem: build the dependency graph, find cuts that minimize cross-cut edges.
Nodes are units (functions, classes, or files — pick a granularity). Edges are dependencies:
| Edge type | Weight hint | Why | | ------------------------ | -------------------- | -------------------------------------------- | | Direct call (A calls B) | High | Runtime coupling — A breaks if B's API changes | | Import / include | Medium | Compile-time coupling | | Shared data type | Medium | Schema coupling — both break if the type changes | | Shared database table | High | Data coupling — hardest to split | | Shared config key | Low | Easy to duplicate | | Co-change (git log) | Medium | Empirical — these have changed together |
Weight edges by coupling strength. A function call once at startup is weaker than one in a hot loop.
| Method | How | Good for | | ------------------------ | ----------------------------------------------------------- | ------------------------------- | | Louvain / modularity | Maximize (edges inside clusters) − (expected random edges) | General-purpose, no target K | | Spectral clustering | Eigenvectors of the graph Laplacian; cut at the gap | When K is roughly known | | Min-cut between seeds | Pick two modules you know should separate; find the cheapest cut | Extracting one thing specifically | | Directory-as-prior | Start from existing folder structure; measure if it's actually a good clustering | Validating current layout |
Start with directory-as-prior. The existing layout might already be right. Measure modularity of the current folder structure — if it's high, the work is done. If it's low, the folders are lying.
For a proposed boundary around cluster C:
Good cut: cohesion > 0.5, coupling < 0.2. (Rules of thumb — domain varies.)
Graph: 340 files, 1200 import edges, 89 shared-model edges.
Directory-as-prior:
| Directory | Cohesion | Coupling | Verdict |
| ------------ | -------- | -------- | ---------------------------------------- |
| accounts/ | 0.71 | 0.08 | Clean boundary. Extract as-is. |
| orders/ | 0.64 | 0.31 | Leaky — what's crossing? |
| reports/ | 0.22 | 0.45 | Not a real component. Directory is a lie. |
| utils/ | 0.05 | 0.68 | Expected — utils is a grab bag, not a component |
Drill into orders/ coupling:
| Cross-edge | Count | Type |
| ------------------------------------- | ----- | ------------ |
| orders/views.py → accounts/models.User | 14 | Shared model |
| orders/tasks.py → inventory/stock.py | 8 | Direct call |
| orders/models.py → payments/models.Payment | 5 | FK relation |
The User dependency is fine — every service needs auth. The inventory coupling is the problem: orders shouldn't be calling inventory synchronously.
Proposed cut: orders is a component. Its interface is: receives User (from auth), emits OrderPlaced event (consumed by inventory). The 8 direct stock.py calls become event publications.
The hardest coupling to break is shared tables. If orders and inventory both write to stock_levels, you can't cleanly separate them — whoever owns the table owns the other's data.
Options:
inventory owns the table. orders calls inventory's API, never touches the table.Flag shared-table edges in the output — they're where the extraction actually hurts.
Extract most stable first (lowest instability — depended on, doesn't depend on much). Those become foundation services. Extract leaf features last — they depend on everything.
reports/ having low cohesion is common — it's where miscellaneous features go to die.## Dependency graph
Nodes: <N> (granularity: <file | class | function>)
Edges: <M> (<breakdown by type>)
## Current structure evaluation
| Directory | Cohesion | Coupling | Keep / Restructure |
| --------- | -------- | -------- | ------------------ |
## Proposed components
### <Component name>
Contents: <files/modules>
Cohesion: <score>
Interface (incoming): <what others call into this component>
Dependencies (outgoing): <what this component needs>
Shared data: <tables/types crossing the boundary — THE HARD PART>
## Extraction order
1. <component> — instability <score>, extract first
...
## Blockers
<shared tables, circular component deps, god objects that touch everything>
development
Extracts human-readable pseudocode from a verified formal artifact (Dafny, Lean, TLA+) while preserving the verified properties as annotations, so the proof-carrying logic can be reimplemented in a production language. Use when porting verified code to an unverified target, when documenting what a formal spec actually does, or when handing a verified algorithm to an implementer.
development
Translates natural-language or pseudocode descriptions of concurrent and distributed systems into TLA+ specifications ready for the TLC model checker. Identifies state variables, actions, type invariants, safety properties, and liveness properties from the description. Use when formalizing a protocol, when the user describes a distributed algorithm to verify, when designing a consensus or locking scheme, or when starting formal verification of a concurrent system.
testing
Reduces a TLA+ model so TLC can actually check it — shrinks constants, adds state constraints, abstracts data, or applies symmetry — when the state space is too large to enumerate. Use when TLC runs out of memory, when checking takes hours, or when a spec works at N=2 and you need confidence at larger scale.
development
TLA+-specific instance of model-guided repair — reads a TLC error trace, identifies the enabling condition that should have been false, strengthens the corresponding action, and maps the fix to source code. Use when TLC reports an invariant violation or deadlock and you have the code-to-TLA+ mapping from extraction.