skills/terraform-skill/SKILL.md
Use when writing, reviewing, or debugging Terraform/OpenTofu modules, tests, CI, scans, or state ops - diagnoses failure mode (identity churn, secrets, blast radius, CI drift, state corruption) with version-aware guards.
npx skillsauth add antonbabenko/terraform-skill terraform-skillInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Diagnose-first guidance for Terraform and OpenTofu. Core file is a workflow; depth lives in references loaded on demand.
Every Terraform/OpenTofu response must include:
terraform or tofu), exact version, providers, state backend, execution path (local/CI/Cloud/Atlantis), environment criticality. State assumptions explicitly if the user did not provide them.fmt -check, validate, plan -out, policy check) tailored to runtime and risk tier.Never recommend direct production apply without a reviewed plan artifact and approval.
Never run terraform destroy (targeted or full) without first running terraform plan -destroy and showing the user every resource that will be deleted — including implicit dependents pulled in via locals or for_each. Get explicit confirmation before proceeding. Never use -auto-approve on destroy.
moved, import), CI changes, policy rules.| Failure category | Symptoms | Primary references |
|------------------|----------|--------------------|
| Identity churn | Resource addresses shift after refactor, count index churn, missing moved blocks | Code Patterns: count vs for_each, Code Patterns: moved blocks, Code Patterns: LLM mistakes |
| Secret exposure | Secrets in defaults, state, logs, CI artifacts | Security & Compliance, Code Patterns: write-only, State Management |
| Blast radius | Oversized stacks, shared prod/non-prod state, unsafe applies | State Management, Module Patterns |
| Destroy cascade | Targeted destroy deletes more than expected; locals referencing a targeted resource make all for_each consumers implicit dependents | Response Contract: plan-destroy first; State Management: Safe Destroy |
| CI drift | Local plan ≠ CI plan, apply without reviewed artifact, unpinned versions | CI/CD Workflows, Code Patterns: versions |
| Compliance gaps | Missing policy stage, no approval model, no evidence retention | Security & Compliance, CI/CD Workflows |
| Testing blind spots | Plan-only validation of computed values, set-type indexing, mock/real confusion | Testing Frameworks |
| State corruption / recovery | Stuck lock, backend migration, drift reconciliation | State Management |
| Provider upgrade risk | Breaking-change provider bump, unpinned modules | Code Patterns: versions, Module Patterns |
| Provider lifecycle | Removing a provider with resources still in state, orphaned resources, removed block usage | State Management: Provider Removal |
| Bootstrap / orchestration misuse | null_resource + local-exec for bootstrap, remote-exec for setup scripts, provisioner stdout leaking secrets in CI logs | Code Patterns: Provisioners as Last Resort |
| Navigation / safe-rename blind spots | Cannot locate symbol defs/refs semantically, value-symbol rename done as blind text replace, grep-only refactor missing refs, hallucinated rg shim | Code Intelligence |
| Cross-cloud / provider mapping | "What's the Azure/GCP equivalent of X", picking a backend/auth model per cloud | State Management: Cross-cloud equivalents |
Activate when: creating or reviewing Terraform/OpenTofu configurations or modules, setting up or debugging tests, structuring multi-environment deployments, implementing IaC CI/CD, choosing module patterns or state organization, configuring or migrating remote state backends.
Don't use for: basic HCL syntax questions Claude already knows, provider API reference (link to docs), cloud-platform questions unrelated to Terraform/OpenTofu.
| Type | When to Use | Scope | |------|-------------|-------| | Resource module | Single logical group of connected resources | VPC + subnets, SG + rules | | Infrastructure module | Collection of resource modules for a purpose | Multiple resource modules in one region/account | | Composition | Complete infrastructure | Spans multiple regions/accounts |
Flow: resource → resource module → infrastructure module → composition.
environments/ # prod/ staging/ dev/ — per-env configurations
modules/ # networking/ compute/ data/ — reusable modules
examples/ # minimal/ complete/ — docs + integration fixtures
Separate environments from modules. Use examples/ as both documentation and test fixtures. Keep modules small and single-responsibility.
See Module Patterns for architecture principles, naming conventions, variable/output contracts.
aws_instance.web_server, not aws_instance.main)this for genuine singleton resources onlyvpc_cidr_block, not cidr)main.tf, variables.tf, outputs.tf, versions.tfSee Module Patterns: Variable Naming and Code Patterns: Block Ordering for examples.
Resource blocks: count/for_each first → arguments → tags → depends_on → lifecycle.
Variable blocks: description → type → default → validation → nullable → sensitive.
See Code Patterns: Block Ordering & Structure for the full rules and examples.
| Situation | Approach | Tools | Cost |
|-----------|----------|-------|------|
| Quick syntax check | Static analysis | validate, fmt | Free |
| Pre-commit validation | Static + lint | validate, tflint, trivy, checkov | Free |
| Terraform 1.6+, simple logic | Native test framework | terraform test | Free-Low |
| Pre-1.6, or Go expertise | Integration testing | Terratest | Low-Med |
| Security/compliance focus | Policy as code | OPA, Sentinel | Free |
| Cost-sensitive workflow | Mock providers (1.7+) | Native tests + mocks | Free |
| Multi-cloud, complex | Full integration | Terratest + real infra | Med-High |
Before writing test code: validate resource schemas via Terraform MCP so assertions target real attributes.
command = plan — fast, for input-derived values onlycommand = apply — required for computed values (ARNs, generated names) and set-type nested blocks[0] — use for expressions or materialize via command = applySee Testing Frameworks for static-analysis pipelines, native-test patterns, Terratest integration, mock providers, and the full LLM-mistake checklist.
| Scenario | Use | Why |
|----------|-----|-----|
| Boolean condition (create / don't) | count = condition ? 1 : 0 | Optional singleton toggle |
| Items may be reordered or removed | for_each = toset(list) | Stable resource addresses |
| Reference by key | for_each = map | Named access |
| Multiple named resources | for_each | Better identity stability |
Never use list index as long-lived identity — removing a middle element reshuffles every address after it. For the decision matrix, safe migration playbook, moved block patterns, and known-at-plan failure cases, see Code Patterns: count vs for_each.
Using try() in a local to prefer a conditional resource's attribute over its parent is a specialized but high-value pattern — it forces correct deletion order without explicit depends_on. Common use: VPC + secondary CIDR associations + subnets.
See Code Patterns: Locals for Dependency Management for the full pattern and worked example.
Standard layout:
my-module/
├── README.md # Usage documentation
├── main.tf # Primary resources
├── variables.tf # Typed inputs with descriptions
├── outputs.tf # Output values
├── versions.tf # required_version + required_providers
├── examples/
│ ├── minimal/
│ └── complete/
└── tests/
└── module_test.tftest.hcl # or Go for Terratest
Variable contracts: always description, always explicit type, use validation for complex constraints, use sensitive = true for secrets, prefer optional() with typed defaults (1.3+) over untyped map(any).
Output contracts: always description, mark sensitive outputs, expose stable subsets (not whole provider objects).
See Module Patterns for the full contract patterns, module release checklist, and LLM-mistake checklist.
Pipeline stages: validate → test → plan → apply (with environment protection).
Cost control: mock providers on PR validation, real-cloud integration only on main or scheduled, tag test resources, auto-cleanup.
Drift prevention: pin runtime and providers, commit .terraform.lock.hcl, apply the reviewed plan artifact from the plan stage (do not re-run plan inside the apply job), run policy/security stage on every path to apply.
See CI/CD Workflows for GitHub Actions, GitLab CI, and Atlantis templates plus the LLM-mistake checklist.
Essential checks:
trivy config .
checkov -d .
Don't: store secrets in variables or .tfvars, use default VPC, skip encryption, open security groups to 0.0.0.0/0, use inline ingress/egress blocks in aws_security_group.
Do: source secrets from a cloud secret manager (AWS Secrets Manager / Azure Key Vault / GCP Secret Manager) or use write_only arguments on 1.11+, create dedicated VPCs, enforce encryption at rest and TLS, least-privilege SGs, use separate aws_vpc_security_group_{ingress,egress}_rule resources (e.g. AWS provider v5+).
Marking a variable sensitive = true masks display only — the value still lives in state. Use write_only / *_wo on 1.11+, or keep secret material out of Terraform entirely via runtime lookups.
See Security & Compliance for trivy/checkov pipelines, state-file hardening, compliance mappings, and the LLM-mistake checklist.
Never use local state in teams or production. Remote backends provide automatic locking, encryption, versioning, audit logging, and safe collaboration.
AWS example (Azure azurerm / GCP gcs / TF Cloud syntax: see State Management: Choosing a Remote Backend):
terraform {
backend "s3" {
bucket = "my-terraform-state"
key = "prod/vpc/terraform.tfstate"
region = "us-east-1"
encrypt = true
use_lockfile = true # Native S3 locking, 1.10+
}
}
On Terraform < 1.10, use dynamodb_table = "terraform-state-lock" instead of use_lockfile. Azure Storage, GCS, and Terraform Cloud all offer built-in locking - see the State Management reference for syntax. For choosing among backends and their locking models, see Choosing a Remote Backend.
| Pattern | Use When | Example Path |
|---------|----------|--------------|
| Per environment | Different teams per env | prod/terraform.tfstate, staging/... |
| Per component | Independent lifecycles | prod/vpc/, prod/eks/, prod/rds/ |
| Hybrid (recommended) | Both benefits | prod/networking/, prod/compute/, staging/networking/ |
Split state when: different teams, different update cadences, or >500 resources. Combine when: tightly coupled resources, <100 resources, same lifecycle.
See State Management for locking, migration, multi-team isolation, disaster recovery, and the LLM-mistake checklist.
| Component | Strategy | Example |
|-----------|----------|---------|
| Terraform runtime | Pin minor | required_version = "~> 1.9" |
| Providers | Pin major | version = "~> 5.0" |
| Modules (prod) | Pin exact | version = "5.1.2" |
| Modules (dev) | Allow patch | version = "~> 5.1" |
Commit .terraform.lock.hcl intentionally. Keep provider/runtime upgrades in a separate PR from functional changes. See Code Patterns: Version Management for constraint syntax and upgrade workflow.
| Feature | Min version | Common use |
|---------|-------------|------------|
| try() | 0.13+ | Safe fallbacks, replaces element(concat()) |
| nullable = false | 1.1+ | Prevent null silently overriding defaults |
| moved blocks | 1.1+ | Refactor without destroy/recreate |
| optional() with defaults | 1.3+ | Typed object attributes |
| import blocks | 1.5+ | Declarative imports, reviewable in VCS |
| check blocks | 1.5+ | Runtime assertions |
| Native terraform test | 1.6+ | Built-in test framework |
| Mock providers | 1.7+ | Cost-free unit testing |
| removed blocks | 1.7+ | Declarative resource removal |
| Provider-defined functions | 1.8+ | Provider-specific transformations (requires provider to declare functions) |
| Cross-variable validation | 1.9+ | Reference other var.* in validation blocks |
| write_only arguments | 1.11+ | Secrets never stored in state |
| S3 native lock-file | 1.10+ | State locking without DynamoDB |
Before emitting a feature, verify the runtime floor. See Code Patterns: Feature Guard Table for the full table with common LLM error patterns per feature.
terraform test / tofu test available — migrate simple unit tests, keep Terratest for complex integration.use_lockfile) is the correct default for new configurations — DynamoDB locking is no longer required.write_only arguments for secret handling keep credentials out of state.Semantic navigation for HCL. terraform-ls is optional; without it every row below degrades to a disclosed rg + Read fallback.
Self-contained terraform-ls layer of a generic code-intelligence discipline - apply the rows below directly. Recommended companion: the code-intelligence plugin (same antonbabenko/agent-plugins marketplace) carries the generic discipline (position anchoring, degradation gate, disclosure format, anti-phantom-shim) and ships /code-intelligence:doctor for readiness. If it is installed, defer to its generic protocol; this skill stays fully self-contained without it.
| Goal | Use | Tradeoff |
|------|-----|----------|
| Find definition / all references | terraform-ls goToDefinition / findReferences | Needs init + a position anchor |
| Rename value symbol (var/local/output/provider alias) | Manual: findReferences -> per-file fresh Read -> edit -> validate | No rename provider |
| Rename resource/module address | moved block + plan shows 0 destroy | Text rename forces destroy/recreate |
| Exact text / known name / .tfvars / non-HCL | rg + Read | No semantic scope |
✅ Supported: goToDefinition, findReferences, documentSymbol, hover, workspaceSymbol.
❌ Unsupported: goToImplementation, call hierarchy, rename provider. Do not call these then report their absence as a finding.
terraform/tofu on PATH, terraform init run; cold start may need one retry.file:line:character) - anchor with rg first, never symbol-name-only.Depth: Code Intelligence.
Progressive disclosure — essentials here, depth on demand:
terraform_remote_state rules, release checklistcount/for_each deep dive, modern features, version management, localsApache License 2.0. See LICENSE for full terms.
Copyright © 2026 Anton Babenko
development
Use when working with Terraform or OpenTofu - creating modules, writing tests (native test framework, Terratest), setting up CI/CD pipelines, reviewing configurations, choosing between testing approaches, debugging state issues, implementing security scanning (trivy, checkov), or making infrastructure-as-code architecture decisions
development
Maintainer-only workflow for handling GitHub Secret Scanning alerts on OpenClaw. Use when Codex needs to triage, redact, clean up, and resolve secret leakage found in issue comments, issue bodies, PR comments, or other GitHub content.
development
Maintainer workflow for OpenClaw releases, prereleases, changelog release notes, and publish validation. Use when Codex needs to prepare or verify stable or beta release steps, align version naming, assemble release notes, check release auth requirements, or validate publish-time commands and artifacts.
development
Run, watch, debug, and extend OpenClaw QA testing with qa-lab and qa-channel. Use when Codex needs to execute the repo-backed QA suite, inspect live QA artifacts, debug failing scenarios, add new QA scenarios, or explain the OpenClaw QA workflow. Prefer the live OpenAI lane with regular openai/gpt-5.4 in fast mode; do not use gpt-5.4-pro or gpt-5.4-mini unless the user explicitly overrides that policy.