skills/python-arch-review/SKILL.md
Architecture review for Python 3 projects enforcing TDD (Red->Green->Refactor->Quality Check), YAGNI principles, and code quality gates. Use when (1) writing new Python code, (2) reviewing existing Python code, (3) refactoring Python modules, (4) adding tests to Python projects, (5) checking code quality metrics, (6) running quality gates before merging, or (7) improving Python architecture with type safety and clean design. Integrates Ruff, mypy, radon, bandit, pip-audit, and pytest-cov.
npx skillsauth add michaelalber/ai-toolkit python-arch-reviewInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
"Any fool can write code that a computer can understand. Good programmers write code that humans can understand." -- Martin Fowler
"Make it work, make it right, make it fast -- in that order." -- Kent Beck
Python's expressiveness without discipline produces codebases that rot. This skill enforces a TDD-first workflow where every change begins with a failing test and ends with a clean quality gate. The cycle is non-negotiable: Red, Green, Refactor, Quality Check. Every iteration. No exceptions for production code.
YAGNI is the governing constraint. The simplest code that passes the tests is the right code. Abstractions are earned through repetition (Rule of Three) — when three concrete implementations share a pattern, then extract. Until that point, duplication is cheaper than the wrong abstraction.
Quality gates are automated enforcement of standards that humans forget under deadline pressure. Ruff catches style drift. mypy catches type errors tests miss. Bandit catches security mistakes code review overlooks. Radon catches complexity creep before it becomes unmaintainable.
What this skill IS: A disciplined TDD process for writing, testing, and reviewing Python code. Automated quality gates enforcing measurable standards. A framework for applying YAGNI and clean architecture to real Python projects.
What this skill is NOT: A high-level architecture critique (use architecture-review). Optional for production code. Language-agnostic — every rule is specific to Python 3.10+.
| # | Principle | Description | Applied As |
|---|-----------|-------------|------------|
| 1 | TDD Discipline | Every code change begins with a failing test. The test defines the requirement; the code satisfies it. Reversing this order produces code that tests confirm rather than tests that drive design. | Before any implementation, write a failing test. Confirm it fails. Only then implement. |
| 2 | YAGNI as Default | Do not build what you do not need. Rule of Three governs when abstraction is justified: when three concrete cases share a pattern, extract. Not before. | Challenge every abstraction: "Do I have three concrete uses for this? If not, inline it." |
| 3 | Type Safety as Documentation | Type hints are executable documentation that mypy verifies on every commit. A complete signature tells the reader what it accepts, returns, and what can be None — without reading the body. | All public functions must have complete type annotations. No # type: ignore without an explanatory comment. |
| 4 | Clean Architecture Boundaries | Dependencies flow inward. Domain logic does not import infrastructure. Use typing.Protocol for boundaries. No circular imports. | Enforce: domain has zero imports from infrastructure. |
| 5 | Dependency Management | Every dependency is a liability — security vulnerabilities, breaking changes, license conflicts. Pin versions. Audit with pip-audit. Prefer the standard library. | Run pip-audit in the quality gate. Question every new dependency: "Does stdlib solve this?" |
| 6 | Testing Quality over Quantity | 100% coverage with meaningless assertions is worse than 80% with behavioral tests. AAA. One assertion per test. Descriptive names. Cover edge cases. | Enforce test_should_<expected>_when_<condition> naming. Review tests for meaningful assertions. |
| 7 | Security as Habit | Parameterized queries, no dynamic code execution with user input, subprocess with shell=False, secrets in environment variables, secrets module for tokens. | Run bandit in every quality gate. Follow references/security-checklist.md. |
| 8 | Incremental Improvement | Large refactors fail. Each TDD cycle improves one thing. Each refactor phase cleans one smell. Over dozens of cycles the codebase transforms. | Limit refactor scope to what current tests cover. Add tests before refactoring untested code. |
| 9 | Documentation Through Tests | test_should_raise_value_error_when_amount_is_negative tells more about a function's contract than any docstring. Tests are documentation that cannot go stale. | Write test names that describe the contract. If you cannot name it clearly, the requirement is not understood. |
| 10 | Tooling as Enforcement | Humans forget. Tools do not. Ruff for style, mypy for types, radon for complexity, bandit for security, pytest-cov for coverage. | Run scripts/quality_check.py <path> before every commit. No exceptions for production code. |
| Query | When to Call |
|-------|--------------|
| search_knowledge("TDD red green refactor cycle") | During RED phase — confirm test-first discipline and AAA structure |
| search_knowledge("YAGNI rule of three abstraction") | During REFACTOR phase — validate when abstraction is justified |
| search_knowledge("python type hints mypy strict") | During QUALITY CHECK — verify mypy configuration patterns |
| search_knowledge("python cyclomatic complexity radon") | During QUALITY CHECK — confirm complexity threshold enforcement |
| search_knowledge("OWASP python security bandit") | During QUALITY CHECK — verify security scanning patterns |
| search_knowledge("clean architecture dependency direction") | During architecture boundary reviews |
| search_knowledge("pytest fixtures mocking boundaries") | When writing tests — confirm mocking strategy at system boundaries |
Search before writing tests, before refactoring structural patterns, and before configuring quality gates. Cite the source path in session notes.
Every code change follows four phases: Red → Green → Refactor → Quality Check. Sequential. No skipping. No reordering.
Write a test that describes the behavior you want. Run it. It must fail. The failure confirms the test is actually testing something new.
Actions:
test_should_<expected>_when_<condition>Completion Criteria: One failing test exists with a clear expected error, following the naming convention and AAA structure.
def test_should_return_sum_when_two_positive_numbers():
# Arrange
calc = Calculator()
# Act
result = calc.add(2, 3)
# Assert
assert result == 5
Write the absolute minimum code to make the failing test pass. Not elegant. Not complete. The minimum. Resist adding behavior no test demands.
Actions:
Completion Criteria: New test passes. All existing tests pass. No code beyond what the tests require.
The tests are green. Improve structure without changing behavior. Extract methods. Remove duplication. Improve names. If tests stay green, behavior is preserved.
Actions:
Completion Criteria: All tests pass. No method over CC 10, no class over 200 lines. Names are self-documenting. No new behavior introduced.
Run the complete quality gate. Not optional. Every cycle ends here.
scripts/quality_check.py <path>
Gate Thresholds:
| Gate | Tool | Threshold | Reference |
|------|------|-----------|-----------|
| Lint + Format | Ruff | Zero errors | references/ruff-config.md |
| Type Check | mypy --strict | Zero errors | references/mypy-config.md |
| Cyclomatic Complexity | radon | Methods < 10, Classes < 20 | quality_check.py --max-complexity |
| Maintainability Index | radon | 70+ | quality_check.py --min-maintainability |
| Test Coverage | pytest-cov | 80% business logic, 95% security-critical | quality_check.py --min-coverage |
| Security Scan | bandit | Zero high/medium findings | references/security-checklist.md |
See references/review-checklist.md for the full code review checklist.
Completion Criteria: scripts/quality_check.py <path> exits 0. No new warnings. Coverage meets threshold. Code is ready to commit.
<python-review-state>
mode: [red | green | refactor | quality_check | review]
project_path: [path to project root]
phase: [current TDD phase]
tests_passing: [true | false | unknown]
coverage_pct: [number or unknown]
ruff_clean: [true | false | unknown]
mypy_clean: [true | false | unknown]
security_clean: [true | false | unknown]
last_action: [what was just done]
next_action: [what should happen next]
</python-review-state>
Example:
<python-review-state>
mode: red
project_path: /home/dev/myproject
phase: RED -- writing failing test
tests_passing: true (existing tests)
coverage_pct: 82
ruff_clean: true
mypy_clean: true
security_clean: true
last_action: Identified next behavior -- user registration validation
next_action: Write test_should_raise_validation_error_when_email_is_invalid
</python-review-state>
## Python Architecture Review Session
**Project**: [name] | **Path**: [path] | **Scope**: [new feature / refactor / bug fix / test improvement]
**Quality Status:** Tests [passing/failing/none] | Coverage [N%] | Ruff [clean/errors] | mypy [clean/errors] | Security [clean/findings]
**Starting Phase**: [RED — identifying first behavior to test]
[state block]
Always write the test first. Never generate implementation code without a failing test. If you find yourself writing implementation without a failing test, stop, delete the implementation, and write the test. The test defines what "correct" means before you write the code.
Never skip the quality check. Small changes accumulate. One skipped gate leads to another. Type errors slip in, complexity creeps, security issues hide. Run scripts/quality_check.py after every TDD cycle — no exceptions for production code. If the gate fails, fix before committing.
YAGNI governs all abstractions. One concrete case needs one concrete implementation. When the third concrete case appears, then extract the abstraction. Premature abstraction — a factory when one validator is needed — is more expensive than duplication.
Keep functions small and focused. A function should do one thing. If a block within a function needs a comment, that block is its own function with a descriptive name. Cyclomatic complexity above 10 is a gate failure. Extract until every function has a single, clear purpose.
Prefer composition over inheritance. Deep inheritance hierarchies are the leading cause of fragile Python code. Use typing.Protocol for interface contracts. Reserve inheritance for genuine is-a relationships with shared behavior. Flat composition beats 4-level class hierarchies — each piece is independently testable.
Type every public interface. Every public function, method, and class attribute must have complete type annotations. mypy --strict enforces this. Optional must be explicit — never rely on implicit None defaults.
| Anti-Pattern | Why It Fails | What to Do Instead |
|--------------|-------------|-------------------|
| Testing After Coding | Tests written after the fact mirror the code's structure and miss cases the developer did not think of. They give false confidence without driving design. | Follow the Red phase strictly. Write the test FIRST. Watch it fail. Then implement. |
| Ignoring Type Hints | Type errors are the most common source of production bugs in dynamic languages. Without types, mypy cannot help and callers must read the entire function body to understand its contract. | Run mypy --strict. Annotate every public function. See references/mypy-config.md. |
| Massive Test Fixtures | Tests become coupled to the fixture. When it changes, dozens of unrelated tests break. Setup becomes so complex it needs its own tests. | Factory functions that create minimal objects. Each test builds only what it needs. |
| Mocking Everything | Tests that mock everything test the mocking framework, not the code. They pass when the implementation is wrong and break on every refactoring. | Mock at system boundaries (I/O, external services, time). Test business logic with real objects. |
| God Classes | 500+ line classes with 20+ methods violate SRP, resist testing in isolation, cause merge conflicts, and resist refactoring because everything depends on them. | Extract cohesive groups of methods into focused classes. Compose them. Each class has one reason to change. |
| Premature Optimization | Optimization without measurement is guessing. The bottleneck is almost never where you think. Premature optimization adds complexity, bugs, and maintenance cost for gains that may not matter. | Write clear code first. Profile under realistic load. Optimize the measured bottleneck. Verify it helped. |
| Bare Except Clauses | Catching everything hides bugs. KeyboardInterrupt, SystemExit, and GeneratorExit should not be caught. except Exception hides TypeError and ValueError that indicate bugs. | Catch specific exceptions. Log unexpected exceptions with full context. |
| Ignoring the Quality Gate | One skipped gate leads to another. Within weeks the codebase has type errors, complexity violations, and security findings expensive to fix in bulk. | Run scripts/quality_check.py after every TDD cycle. No exceptions for production code. |
| Copy-Paste Test Duplication | Test files grow to thousands of lines. When behavior changes, every copy must be updated. Missed copies produce tests that pass but verify stale behavior. | Use pytest.mark.parametrize. Extract shared setup into named helper functions. |
| Circular Imports | Indicate tangled responsibilities. Import-time vs. runtime resolution creates subtle bugs. TYPE_CHECKING guards hide the problem but do not fix the design. | Extract the shared dependency into a third module. Invert the dependency using typing.Protocol. |
Indicators: A previously passing test now fails; unrelated tests break; confusing error messages.
Recovery:
pytest path/to/test.py::test_name -v. If it passes alone, the failure is test interaction (shared state, fixture pollution).git stash, confirm tests pass clean, re-apply, and bisectIndicators: scripts/quality_check.py exits non-zero; mypy errors after method extraction; complexity increased.
Recovery:
ruff check --fix . then ruff format .Indicators: 0% coverage, no type annotations, 50+ line functions, no documentation.
Recovery:
Indicators: Bandit reports high/medium severity; pip-audit finds vulnerable dependencies.
Recovery:
scripts/quality_check.py — confirm resolution and no new findingsarchitecture-review — When structural problems appear (circular dependencies, unclear boundaries, module coupling), use architecture-review for Socratic design critique. This skill enforces code-level quality; architecture-review examines design-level quality.tdd-cycle — For language-agnostic TDD coaching or when the TDD process itself needs pedagogical support. This skill adds Python-specific tooling and quality gates on top.security-review-trainer — When bandit findings reveal recurring patterns, use security-review-trainer to build the habit of writing secure code by default.dependency-mapper — When circular imports appear, use dependency-mapper to generate a concrete dependency graph and compare against intended clean architecture boundaries.Never skip for production code. Acceptable exceptions: spike/prototype branches (spike/*), documentation-only changes, CI/CD config changes. Even in these cases, run the gate before merging back to main.
development
Federal / government security overlay applied ON TOP OF a base language security review (dotnet/python/php/rust/react). Language-agnostic: adds NIST SP 800-53 control mapping, FIPS 140-2/3 cryptographic compliance (with a per-language crypto table), CUI handling, EO 14028 supply-chain requirements, and DOE Order 205.1B, and emits POA&M-ready findings with FIPS 199 impact levels. Use for federal/DOE/DOD/national-laboratory systems. Triggers on "federal security review", "NIST compliance", "NIST 800-53", "FISMA", "CUI", "FIPS audit", "DOE security", "POA&M", "ATO review". Do NOT use alone — run the matching <lang>-security-review FIRST; this overlay maps and extends it.
tools
OWASP-based security review of React / TypeScript front-end applications. Detects the framework (Vite/CRA/Next), entry points, and data flows, scans against the OWASP Top 10 (2025) mapped to React client-side patterns (XSS via raw HTML, URL/protocol injection, secrets in the bundle, insecure token storage, dependency CVEs, missing CSP, open redirects), and produces a manager-friendly executive summary plus a graded technical findings table. Use to audit React code for vulnerabilities. Triggers on "react security review", "frontend security audit", "audit react for vulnerabilities", "owasp react", "react xss", "react security posture", "npm audit review". For federal / gov / DOE / NIST / FIPS / CUI context, run security-review-federal after this base review. Do NOT use to grade architecture/structure — use react-architecture-checklist.
tools
Analyzes legacy React codebases and produces actionable modernization plans. Primary migration paths include class components to function components + hooks, Create React App to Vite, React 16/17 to 18 to 19, JavaScript to TypeScript, Enzyme to React Testing Library, legacy Redux to Redux Toolkit / Zustand / Context, and deprecated lifecycle/API removal. Does NOT perform the migration — assesses, quantifies risk, and plans. Triggers on phrases like "modernize react", "class to hooks", "upgrade react", "migrate CRA to vite", "react legacy migration", "react 17 to 18", "react js to typescript", "react technical debt", "enzyme to RTL".
development
Scaffolds feature-based React / TypeScript architecture using feature folders, presentational + container components, custom hooks, a typed data layer, and structural CQRS (query hooks vs mutation hooks). React analog of dotnet-vertical-slice and python-feature-slice — no DI framework; uses props/context for dependency injection and a query cache for server state. Use when creating feature-based React projects, adding React features, organizing components by feature rather than by technical type, or scaffolding a feature's data layer. Triggers on phrases like "scaffold react feature", "create react slice", "react feature folder", "react vertical slice", "add react feature", "react feature architecture", "organize react by feature".