skills/aws-infrastructure/SKILL.md
Use this skill when designing, building, reviewing, or troubleshooting AWS infrastructure. Triggers when the user works with SST, CDK, Terraform, CloudFormation, or any AWS infrastructure-as-code tool and wants to make correct architectural decisions. Covers: VPC and networking design, compute selection (Lambda vs ECS vs EKS vs EC2), database selection (DynamoDB vs RDS vs Aurora vs ElastiCache), serverless architecture patterns, scaling strategies, cost optimization, security hardening, IAM, monitoring, CI/CD pipelines, multi-account strategy, and operational excellence. Also triggers when the user asks 'should I use X or Y on AWS,' 'how do I scale this,' 'how do I secure this,' 'is this the right architecture,' or 'how do I set this up in SST.' Do NOT use for application-level code logic unrelated to infrastructure.
npx skillsauth add kylejryan/better-code aws-infrastructureInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Good infrastructure makes the right thing easy and the wrong thing hard. It scales without manual intervention, secures by default without extra effort, costs proportionally to usage, and recovers from failures automatically. The goal is infrastructure that the team forgets about — because it just works.
Three forces govern every infrastructure decision:
| Decision | Default Choice | Escalate When | |----------|---------------|---------------| | Compute | Lambda (ARM64) | > 15 min execution, persistent connections, GPU | | Long-running compute | ECS Fargate | Service mesh, multi-tenant → EKS | | Database (key-value) | DynamoDB on-demand | Complex queries, joins → Aurora Serverless v2 | | Async messaging | SQS | Fan-out → SNS+SQS, routing → EventBridge | | API | API Gateway HTTP API | Request validation, API keys → REST API | | Networking | No VPC (serverless) | RDS, ElastiCache, internal services → VPC |
| Priority | Category | Impact | Prefix |
|----------|----------|--------|--------|
| 1 | Architecture Decisions | CRITICAL | arch- |
| 2 | Serverless Patterns | CRITICAL | serverless- |
| 3 | Security | CRITICAL | sec- |
| 4 | Scaling Strategies | HIGH | scaling- |
| 5 | Networking | HIGH | net- |
| 6 | Cost Optimization | HIGH | cost- |
| 7 | Observability | HIGH | obs- |
| 8 | CI/CD and Deployment | HIGH | cicd- |
| 9 | Multi-Account Strategy | MEDIUM-HIGH | multi- |
| 10 | Operational Patterns | MEDIUM-HIGH | ops- |
| 11 | Infrastructure Checklist | MEDIUM | checklist- |
Read individual reference files for detailed explanations, decision trees, and code examples:
references/arch-compute-selection.md
references/arch-database-selection.md
references/sec-iam-least-privilege.md
references/serverless-api-pattern.md
references/scaling-lambda.md
references/net-vpc-design.md
references/cost-big-levers.md
references/_sections.md
Each reference file contains:
After designing or reviewing infrastructure, verify:
removal: "retain" on production stacksdevelopment
Use this skill when performing the actual vulnerability analysis AFTER a threat model has been established (see threat-model skill). Triggers when the user asks to find vulnerabilities, audit code for security, hunt for bugs, or perform security review of source code AND a threat model already exists or the codebase context is clear. This skill enforces depth-first, exploitability-proven analysis — it actively prevents the breadth-first pattern-matching that produces lists of theoretical vulnerabilities. Do NOT use without a threat model; use threat-model skill first. Do NOT use for general code quality review.
development
Staff+ engineering patterns for maximum leverage per line of code. Use this skill when designing abstractions, building reusable primitives, creating shared libraries, reducing code through architecture, reviewing code for leverage and reuse potential, choosing between building vs configuring, or establishing conventions and patterns across a codebase.
development
Use this skill when designing test strategies, writing tests beyond basic unit tests, verifying software for production readiness, or improving test coverage and reliability. Triggers when the user asks about testing strategy, integration tests, end-to-end tests, contract tests, property-based tests, load tests, chaos testing, test architecture, flaky tests, test confidence, 'how do I test this,' 'how do I know this is safe to deploy,' 'my tests are flaky,' 'what should I test,' 'test coverage,' CI/CD test pipelines, or any question about software verification and validation. Also triggers when the user is shipping a change and wants confidence it won't break production. Primarily targets TypeScript and Go but principles apply universally. Do NOT use for writing basic unit tests for simple functions — this skill is for the harder testing questions.
development
Use this skill when debugging software issues, performing root cause analysis, triaging errors from logs or alerts, or investigating why code isn't working as expected. Triggers when the user shares an error message, stack trace, log output, failing test, unexpected behavior, crash report, performance degradation, or says things like 'this isn't working,' 'I'm getting an error,' 'help me debug,' 'why is this failing,' 'something broke,' or 'I can't figure out what's wrong.' Also use when the user has been going back and forth trying fixes that aren't working — this is the signal to stop guessing and start systematically diagnosing. Do NOT use for writing new code from scratch, general code review, or feature development unless a bug is involved.