skills/terraform-iac-expert/SKILL.md
--- license: Apache-2.0 name: terraform-iac-expert version: 1.0.0 category: DevOps & Infrastructure tags: - terraform - iac - infrastructure - aws - gcp - azure - opentofu --- # Terraform IaC Expert ## Overview Expert in Infrastructure as Code using Terraform and OpenTofu. Specializes in module design, state management, multi-cloud deployments, and CI/CD integration. Handles complex infrastructure patterns including multi-environment setups, remote state backends, and secure sec
npx skillsauth add curiositech/windags-skills terraform-iac-expertInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Expert in Infrastructure as Code using Terraform and OpenTofu. Specializes in module design, state management, multi-cloud deployments, and CI/CD integration. Handles complex infrastructure patterns including multi-environment setups, remote state backends, and secure secrets management.
Resource Change Assessment:
├── Breaking changes to core resources (VPC, networks)?
│ ├── Yes → Full rebuild with blue-green deployment
│ └── No → Continue to state migration assessment
├── State file corruption or drift > 20% resources?
│ ├── Yes → Import strategy: selective rebuild of drifted resources
│ └── No → Standard state migration
├── Cost of downtime > $10k/hour?
│ ├── Yes → Blue-green with gradual migration
│ └── No → Direct migration with maintenance window
└── Team expertise level?
├── Junior → Use Terraform Cloud migration tools
└── Senior → Manual state manipulation acceptable
Resource Complexity:
├── Will this pattern be used > 2 times?
│ ├── Yes → Create module
│ └── No → Continue evaluation
├── Configuration > 50 lines?
│ ├── Yes → Module for maintainability
│ └── No → Inline acceptable
├── Needs input validation?
│ ├── Yes → Module with validation blocks
│ └── No → Can remain inline
└── Cross-team sharing required?
├── Yes → Published module with versioning
└── No → Local module sufficient
Team Size & Structure:
├── < 5 engineers, single team?
│ ├── Yes → Workspace-based environments
│ └── No → Continue evaluation
├── Multiple teams, shared infrastructure?
│ ├── Yes → Directory-based with remote state sharing
│ └── No → Hybrid approach
├── Need environment-specific backends?
│ ├── Yes → Directory-based mandatory
│ └── No → Workspaces viable
└── Complex promotion workflows?
├── Yes → Terragrunt with directory structure
└── No → Native Terraform sufficient
Symptoms: "Error acquiring the state lock", terraform refresh shows massive drift, apply fails with "resource already exists" Root Cause: Concurrent operations without proper locking or force-unlock used incorrectly Fix:
terraform force-unlock LOCK_IDterraform refreshterraform import resource.name existing_idterraform state rm resource.phantomSymptoms: "provider registry.terraform.io/hashicorp/aws: required_version" errors, resources show perpetual changes in plan Root Cause: Team members using different provider versions or Terraform versions Fix:
version = "~> 5.0"terraform state replace-providerSymptoms: Module has >20 input variables, complex nested object variables, frequent "unsupported attribute" errors Root Cause: Over-abstracting modules trying to handle every use case Fix:
Symptoms: "Cycle" errors during plan, "depends_on contains dependencies that cannot be determined before apply" Root Cause: Circular dependencies between state files or environments Fix:
Symptoms: Sensitive values visible in terraform.tfstate, state file contains passwords/keys in plaintext Root Cause: Not marking outputs as sensitive, storing secrets in variables instead of data sources Fix:
sensitive = truedata "aws_secretsmanager_secret"Scenario: Migrating manually-created production AWS infrastructure (VPC, EKS cluster, RDS) to Terraform management.
Step 1 - Assessment and Planning
# Discover existing resources
aws ec2 describe-vpcs --filters "Name=tag:Environment,Values=prod"
aws eks describe-cluster --name prod-cluster
aws rds describe-db-instances --db-instance-identifier prod-db
Expert catches: Check for dependencies between resources, identify non-standard configurations that need special handling. Novice misses: Not documenting existing resource relationships before starting.
Step 2 - State Structure Decision Following decision tree → Multiple teams + shared infrastructure = Directory-based approach
terraform/
├── shared/ # VPC, networking
├── data/ # RDS, ElastiCache
├── compute/ # EKS cluster
└── applications/ # App-specific resources
Step 3 - Import Strategy
# Start with shared infrastructure (VPC first)
cd terraform/shared
terraform import aws_vpc.main vpc-12345678
terraform import aws_subnet.private[0] subnet-abcd1234
terraform import aws_subnet.private[1] subnet-efgh5678
# Generate configuration to match
terraform show -json | jq '.values.root_module.resources[] | select(.type=="aws_vpc")'
Expert catches: Import in dependency order (VPC → subnets → route tables → gateways). Novice misses: Importing resources in wrong order causing dependency conflicts.
Step 4 - Validation and Safety Gates
# Verify no changes after import
terraform plan # Should show "No changes"
# Test in non-prod first
cd ../terraform/shared-staging
terraform plan -var-file=staging.tfvars
sensitive = true~> constraints (not >=)This skill should NOT be used for:
kubernetes-orchestrator or docker-specialist insteadgithub-actions-pipeline-builder insteadkubernetes-orchestrator insteaddatabase-architect insteadsite-reliability-engineer insteadDelegate to other skills when:
site-reliability-engineerkubernetes-orchestratoraws-solutions-architectsecurity-specialisttools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.