skills/terraform-ops/SKILL.md
Terraform and OpenTofu infrastructure-as-code operations - project layout, state management, module design, plan/apply safety, CI/CD pipelines, and secrets. Use for: terraform, opentofu, infrastructure as code, IaC, tfstate, terraform state, terraform module, remote backend, terraform plan, terraform apply, for_each, moved block, terraform import, drift detection, tflint, checkov, HCL.
npx skillsauth add 0xDarkMatter/claude-mods terraform-opsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Terraform / OpenTofu infrastructure-as-code: layout, state, modules, safety, CI/CD, secrets.
Version context (verified 2026-06): Terraform 1.15.x (BUSL-1.1 licence since 1.6) · OpenTofu 1.12.x (MPL-2.0 fork of Terraform 1.5.x). Commands below are interchangeable (terraform ↔ tofu) unless flagged. See Terraform vs OpenTofu for the decision note.
| File | Covers |
|------|--------|
| references/state-management.md | Remote backends, locking, moved/import/removed blocks, state surgery, drift detection |
| references/module-patterns.md | Module composition, variable validation, optional/nullable, output contracts, versioning |
| references/cicd-pipelines.md | GitHub Actions plan/apply, OIDC auth, policy gates (tflint/trivy/checkov/OPA), Atlantis/HCP |
| references/security-and-secrets.md | Secrets in state, ephemeral resources, write-only arguments, SOPS/Vault, sensitive limits |
| assets/github-actions-terraform.yml | Ready-to-adapt PR-plan + OIDC-apply workflow |
| scripts/check-action-refs.sh | Staleness verifier for any workflow's uses: action refs (offline structural / live API resolve) |
The action versions pinned in
github-actions-terraform.ymlare point-in-time (verified 2026-06). Runscripts/check-action-refs.sh --livebefore adopting — a tag that was valid at write time may have been retracted or never existed (e.g.[email protected]vs the realv0.33.1).
How many environments / accounts?
│
├─ One environment, one team
│ └─ Single root module + tfvars. Don't over-engineer.
│
├─ Multiple environments (dev/staging/prod)
│ ├─ Need different backend/account/region per env? (usually YES for prod isolation)
│ │ └─ DIRECTORY PER ENVIRONMENT (recommended default)
│ │ environments/{dev,staging,prod}/ each a thin root calling shared modules
│ │
│ └─ Environments truly identical except a few variables, same backend account?
│ └─ Workspaces are *acceptable* — but see the workspace caveats below
│
└─ Many teams / many state files / platform engineering
└─ Directory-per-env + per-component state split (network / data / app)
Consider Terragrunt, Terraform Stacks (HCP), or OpenTofu + CI orchestration
infra/
├── modules/ # Reusable child modules (no provider/backend blocks)
│ ├── network/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ └── versions.tf # required_providers ONLY (no provider config)
│ └── app-service/
├── environments/ # Root modules — one state file each
│ ├── dev/
│ │ ├── main.tf # module "network" { source = "../../modules/network" ... }
│ │ ├── backend.tf # remote backend, env-specific key
│ │ ├── providers.tf # provider config lives in ROOT only
│ │ ├── terraform.tfvars # committed, non-secret env values
│ │ └── versions.tf # required_version + required_providers pins
│ └── prod/
└── .tflint.hcl
| Concern | Directories | Workspaces |
|---------|-------------|------------|
| Separate backend/account per env | Yes — each root has its own backend.tf | No — one backend, envs differ only by state key |
| Blast radius of wrong-env apply | Low — you're physically in prod/ | High — invisible terraform workspace select state |
| Env-specific config divergence | Natural (different main.tf if needed) | terraform.workspace conditionals creep everywhere |
| Prod IAM isolation | Per-dir CI role | Same credentials see all envs |
| Visibility in code review | Diff shows which env changed | Workspace is runtime state, not in the diff |
Workspaces fit short-lived ephemeral copies (PR preview envs) — not the dev/prod boundary. HashiCorp's own docs say workspaces are "not suitable for strong separation."
terraform.tfvars # auto-loaded — per-root committed defaults (non-secret)
*.auto.tfvars # auto-loaded — generated/local overrides
prod.tfvars # explicit only: terraform plan -var-file=prod.tfvars
TF_VAR_db_password=... # env var injection — secrets in CI, never in files
Gotcha: -var-file + directories-per-env is belt-and-braces; with workspaces it's load-bearing and one forgotten flag applies dev values to prod.
Full detail: references/state-management.md.
| Task | Command / block | Notes |
|------|-----------------|-------|
| Remote backend (AWS) | backend "s3" { bucket, key, region, use_lockfile = true } | S3-native locking (TF ≥1.10) — DynamoDB table no longer required |
| Rename resource in code | moved { from = aws_x.a, to = aws_x.b } | Declarative, reviewable, no CLI surgery |
| Adopt existing infra | import { to = aws_x.a, id = "i-123" } + plan -generate-config-out=gen.tf | Config-driven import (TF ≥1.5) beats terraform import CLI |
| Forget without destroy | removed { from = aws_x.a, lifecycle { destroy = false } } | TF ≥1.7; OpenTofu 1.12 also has lifecycle { destroy = false } on resources |
| Drift detection | terraform plan -detailed-exitcode | Exit 0 = clean, 1 = error, 2 = drift — cron it |
| Inspect state | terraform state list / state show ADDR | Read-only, always safe |
| Move state (last resort) | terraform state mv SRC DST | Prefer moved blocks — see "when NOT to" below |
| Pull/push (emergency) | terraform state pull > backup.tfstate | ALWAYS pull a backup before any surgery |
State surgery — when NOT to: if a moved/removed/import block can express it, use the block. CLI state mv/rm is immediate, unreviewed, unversioned, and a typo orphans real infrastructure. Legit uses: splitting state between roots, unwedging a failed migration. Always state pull a backup first.
Full detail: references/module-patterns.md.
module "network" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 6.0" # pin minor-float for registry modules; exact pin in prod roots
# ...
}
| Rule | Why |
|------|-----|
| Composition over inheritance | Roots compose flat modules; never module-wraps-module-wraps-module |
| No provider blocks in child modules | Providers configured in root only; child declares required_providers |
| validation blocks on variables | Fail at plan with a real message, not mid-apply |
| optional(type, default) in object attrs | Callers omit fields; nullable = false rejects explicit null |
| Outputs are the contract | Output IDs/ARNs consumers need; document with description |
| Anti-pattern: thin wrappers | A module that just renames variables of another module adds a version-lag layer and zero value — call the upstream module directly |
□ plan output READ, not skimmed — every destroy/replace explained
□ "Plan: X to add, Y to change, Z to destroy" — does Z surprise you?
□ -/+ (replace) lines: check the "forces replacement" attribute
□ Applying the SAME saved plan that was reviewed: plan -out=tfplan → apply tfplan
□ prevent_destroy on stateful resources (db, state bucket, KMS keys)
□ Cloud-side deletion protection too (RDS deletion_protection, S3 versioning+MFA-delete)
□ No -target unless this is a declared emergency (see below)
□ for_each (stable keys), not count, for any collection that can reorder
| Footgun | Detail | Fix |
|---------|--------|-----|
| count index shift | Removing item 0 of a count list re-addresses every later item → destroy/recreate cascade | for_each with stable string keys |
| -target habit | Skips dependency graph; state diverges from config; hides drift | Emergency-only (broken dependency cycle, partial outage). Follow with a full clean plan |
| prevent_destroy false comfort | Doesn't survive the block being deleted, and doesn't stop state rm + console delete | Pair with cloud-native deletion protection |
| Dynamic blocks everywhere | dynamic for 2 static blocks is obfuscation | Use dynamic only over genuinely variable collections |
| Unpinned providers | aws = ">= 5.0" in prod pulls a breaking major the day it ships | ~> 6.12 + commit .terraform.lock.hcl |
| Apply ≠ reviewed plan | Plan on PR, apply on merge re-plans — drift in between applies unreviewed changes | Save the plan artifact, or accept + re-review the merge plan |
resource "aws_db_instance" "main" {
deletion_protection = true # cloud-side
lifecycle {
prevent_destroy = true # terraform-side
ignore_changes = [password] # if rotated outside TF
}
}
Full detail: references/cicd-pipelines.md · template: assets/github-actions-terraform.yml.
PR opened → fmt -check → validate → tflint → trivy/checkov → plan → plan posted as PR comment
PR merged → plan (fresh) → apply, authenticated via OIDC — no long-lived cloud keys
Nightly → plan -detailed-exitcode → exit 2 ⇒ drift alert
aws-actions/configure-aws-credentials with role-to-assume, never AWS_ACCESS_KEY_ID secrets. Same supply-chain doctrine as this repo's rules: short-lived tokens, no standing credentials.uses: actions/checkout@<sha>), not floating tags.tflint (provider-aware lint), trivy config / checkov (misconfig scan), OPA/conftest for org policy ("no public buckets").| Orchestrator | Fit |
|---|---|
| Plain GitHub Actions | Default — full control, free, template in assets/ |
| Atlantis | Self-hosted PR automation, atlantis plan/apply comments, locking per dir |
| HCP Terraform / Terraform Cloud | Managed runs, Sentinel policy, state hosting; free ≤500 resources |
| Spacelift / env0 / Digger / Scalr | Commercial Atlantis-likes; Digger runs inside your Actions |
uses: ref stalenessGitHub Action versions rot: a tag gets retracted, or a workflow pins one that never existed. scripts/check-action-refs.sh lints every uses: owner/repo@ref line. It's general — pass any workflow file(s) as positionals (default: this skill's own assets/github-actions-terraform.yml).
# Structural only, no network — well-formedness of every uses: ref (CI-safe gate).
# Floating @main/@master → WARN (exit 0; use --strict to fail). Malformed → exit 4.
scripts/check-action-refs.sh --offline .github/workflows/ci.yml
# Live — resolve each ref against the GitHub API. A 404 (ref doesn't exist) → exit 10
# DRIFT; API unreachable/rate-limited → exit 7 (advisory, never fails the build, §7).
# Set GITHUB_TOKEN to dodge the unauthenticated rate limit.
GITHUB_TOKEN=$GH_PAT scripts/check-action-refs.sh --live .github/workflows/*.yml
scripts/check-action-refs.sh --json --offline | jq '.data[] | select(.status!="ok")'
--live is the check that catches the classic aquasecurity/[email protected] mistake — that tag 404s; the real one is v0.33.1. Run live on a schedule (never as a blocking PR gate), offline in PR CI.
# tests/network.tftest.hcl — native test framework (TF ≥1.6 / OpenTofu ≥1.6)
variables { cidr = "10.0.0.0/16" }
run "valid_cidr_plan" {
command = plan # plan = fast unit-ish; apply = real integration
assert {
condition = aws_vpc.main.cidr_block == "10.0.0.0/16"
error_message = "VPC CIDR did not match input"
}
}
run "rejects_tiny_cidr" {
command = plan
variables { cidr = "10.0.0.0/30" }
expect_failures = [var.cidr] # asserts the validation block fires
}
terraform test runs every *.tftest.hcl under tests/; command = apply runs create real (then auto-destroyed) infra — use a sandbox account. Mock providers (mock_provider blocks, TF ≥1.7) fake apply without credentials. For multi-tool/Go-level orchestration (retry, real HTTP probes), Terratest is the heavyweight alternative — native terraform test covers most module CI needs first.
Full detail: references/security-and-secrets.md.
| Mechanism | Version | What it does |
|---|---|---|
| sensitive = true | all | Redacts from CLI output only — value still plaintext in state |
| Ephemeral resources (ephemeral "...") | TF ≥1.10 / OpenTofu ≥1.11 | Fetch secret at run time; never persisted to state or plan |
| Write-only arguments (password_wo) | TF ≥1.11 / OpenTofu ≥1.11 | Send secret to provider; never stored in state; rotate via _wo_version |
| SOPS-encrypted tfvars | tool | Secrets encrypted at rest in git; decrypted at plan time |
| Vault / cloud secret manager | tool | Reference by ID; resource reads secret at boot, TF never sees it |
| OpenTofu state encryption | OpenTofu ≥1.7 | Client-side AES-GCM encryption of state/plan — no Terraform equivalent |
Rule zero: treat state as secret regardless. Encrypt the backend (SSE-KMS), restrict IAM on the bucket, never commit *.tfstate (gitignore it).
| | Terraform | OpenTofu |
|---|---|---|
| Licence | BUSL-1.1 since 1.6 (no production use competing with HashiCorp; fine for normal internal use) | MPL-2.0 — genuinely open source, Linux Foundation |
| Current | 1.15.x | 1.12.x |
| Exclusive features | Stacks (HCP-tied), Terraform Cloud agents, terraform query | State/plan encryption, provider for_each iteration, -exclude flag, early variable eval in backend/module blocks, OCI registry distribution, .tofu file extension |
| Registry | registry.terraform.io | registry.opentofu.org (mirrors most providers) |
| Compatibility | — | Forked at 1.5.x; HCL/state compatible for mainstream use, diverging feature-by-feature since |
Decision: vendors and anyone redistributing IaC tooling commercially → OpenTofu (licence risk). Teams on HCP Terraform/Sentinel → Terraform. Everyone else: either works; OpenTofu's state encryption is the single biggest technical differentiator. Migration terraform → tofu is tofu init + state-compatible up to ~1.8-era features; the gap widens each release — migrate early or commit.
terraform init -upgrade # init / upgrade providers within constraints
terraform fmt -recursive -check # CI: fail on unformatted
terraform validate # syntax + internal consistency (no creds needed after init)
terraform plan -out=tfplan # save plan for exact-apply
terraform show -json tfplan | jq # machine-readable plan (policy tools eat this)
terraform apply tfplan # apply EXACTLY the reviewed plan
terraform plan -detailed-exitcode # 0 clean / 2 drift — for cron drift checks
terraform plan -refresh-only # show drift without proposing config changes
terraform apply -replace=aws_x.a # force recreate one resource (replaces old taint)
terraform state pull > backup.json # ALWAYS before surgery
terraform output -json # consume outputs in scripts
terraform graph | dot -Tsvg > g.svg # dependency graph
tofu init # OpenTofu: same verbs throughout
tools
yt-dlp operations - the media ACQUISITION layer that feeds ffmpeg-ops: format selection (-S sort vs -f filters) that avoids post-download transcodes, --download-sections clip-at-download, audio-only extraction for STT pipelines (-x --audio-format opus), playlists + --download-archive incremental channel syncs, cookies/auth (--cookies-from-browser), rate limiting and politeness, SponsorBlock mark/remove, output templates (-o), subtitle download (--write-subs/--write-auto-subs), remux-vs-recode doctrine, and failure triage (403s, throttling, geo blocks, the nsig-extraction class that means yt-dlp is outdated). Triggers on: yt-dlp, ytdlp, youtube-dl, download video, download youtube, download from youtube, download playlist, download channel, archive channel, channel sync, rip audio, youtube to mp3, youtube to mp4, save video, grab video, video downloader, download subtitles, download transcript, clip from youtube, download section, sponsorblock, cookies-from-browser, download-archive, nsig, requested format is not available, sign in to confirm, download livestream, record stream, live-from-start, premiere, impersonate.
tools
Comprehensive ffmpeg/ffprobe operations - probe-first media processing: transcode and compress (H.264/H.265/AV1/Opus), frame-accurate cut/trim/concat, EDL-driven editing, color grading and .cube LUTs, audio loudnorm and mixing, STT/Whisper audio prep, subtitles, GIF and thumbnails, HLS packaging, hardware encoding (NVENC/QSV/AMF/VideoToolbox), restoration, scene and silence detection, VMAF quality gates, screen capture, yt-dlp interop. Triggers on: ffmpeg, ffprobe, transcode, convert video, compress video, encode video, extract audio, trim video, cut video, concat videos, video to gif, thumbnail, contact sheet, burn subtitles, watermark, resize video, crop video, change fps, slow motion, timelapse, loudnorm, normalize audio, audio for whisper, transcription prep, scene detection, silence detection, remove silence, color grade, LUT, tonemap HDR, vmaf, nvenc, hardware encode, hls, remux, faststart, deinterlace, stabilize video, denoise video, screen record, EDL, keyframes.
development
Payload CMS 3 (Next.js-native) architecture - collections, globals, fields, access control, hooks, Local API, storage adapters, and database (Postgres/MongoDB/SQLite). Use for: payload, payloadcms, payload cms, payload 3, collection config, access control, payload hooks, local api, payload fields, multi-tenant payload, payload nextjs, payload s3, payload r2, payloadcms architecture, headless cms typescript.
testing
Cypress end-to-end and component testing operations - selector/retry-ability strategy, cy.intercept network stubbing, cy.session auth, component vs e2e, flake diagnosis, CI, Test Replay. Use for: cypress, e2e test, component test, cy.get, cy.intercept, cy.session, data-cy, data-test, retry-ability, flake, flaky test, cypress.config, cy.mount, Test Replay, custom commands, fixtures.