areas/devops/infrastructure/skills/state-management/SKILL.md
Manage Terraform remote state — backend setup, state isolation, locking, import, mv, and state surgery.
npx skillsauth add sawrus/agent-guides state-managementInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Expertise: Remote backends, state isolation, import, mv, rm, state surgery, cross-stack references.
When setting up a new Terraform backend, debugging state lock, importing manually-created resources, or safely moving resources between state files.
# AWS S3 + DynamoDB lock
terraform {
backend "s3" {
bucket = "mycompany-terraform-state"
key = "${var.environment}/${var.component}/terraform.tfstate"
region = "us-east-1"
encrypt = true
kms_key_id = "alias/terraform-state"
dynamodb_table = "terraform-state-lock"
}
}
# GCS (GCP) — built-in locking, no separate lock table needed
terraform {
backend "gcs" {
bucket = "mycompany-terraform-state"
prefix = "${var.environment}/${var.component}"
}
}
# Terraform Cloud / HCP Terraform
terraform {
cloud {
organization = "mycompany"
workspaces { name = "production-network" }
}
}
state/
├── staging/
│ ├── network/terraform.tfstate
│ ├── k8s-cluster/terraform.tfstate
│ └── databases/terraform.tfstate
└── production/
├── network/terraform.tfstate
├── k8s-cluster/terraform.tfstate
└── databases/terraform.tfstate
Rule: Staging and production must use separate state files (separate key/prefix, ideally separate bucket).
terraform_remote_state)# ✅ Publish outputs to SSM Parameter Store
resource "aws_ssm_parameter" "vpc_id" {
name = "/${var.environment}/network/vpc_id"
type = "String"
value = aws_vpc.this.id
}
# ✅ Consume in another stack via data source
data "aws_ssm_parameter" "vpc_id" {
name = "/${var.environment}/network/vpc_id"
}
resource "aws_subnet" "app" {
vpc_id = data.aws_ssm_parameter.vpc_id.value
}
# 1. Write the resource block in .tf first
resource "aws_s3_bucket" "legacy" {
bucket = "my-legacy-bucket"
}
# 2. Import the existing resource
terraform import aws_s3_bucket.legacy my-legacy-bucket
# 3. Run plan — should show no changes if .tf matches reality
terraform plan # should be: "No changes."
# Safe rename using moved{} block (Terraform 1.1+) — no CLI required
moved {
from = aws_instance.web
to = aws_instance.this["web-01"]
}
# After apply, remove the moved{} block
# Manual state mv (use when moved{} block is not applicable)
# ALWAYS take a backup first
terraform state pull > backup-$(date +%Y%m%d-%H%M%S).tfstate
terraform state mv \
'aws_security_group.old_name' \
'aws_security_group.new_name'
# List all resources in state
terraform state list
# Show a specific resource's state
terraform state show 'aws_instance.this["web-01"]'
# Remove a resource from state WITHOUT destroying it (use when managed outside TF)
terraform state rm 'aws_s3_bucket.legacy'
# Force-unlock a stuck state lock (use only when lock is genuinely stale)
terraform force-unlock <LOCK_ID>
# Lock ID from error message: "Error acquiring the state lock"
# Pull state for manual inspection
terraform state pull | jq '.resources[] | {type: .type, name: .name}'
# Replace a resource (force-recreate without destroy first)
terraform apply -replace='aws_instance.this["web-01"]'
# Error: "Error acquiring the state lock" + lock ID
# Check DynamoDB for stale lock:
aws dynamodb get-item \
--table-name terraform-state-lock \
--key '{"LockID": {"S": "mycompany-terraform-state/production/network/terraform.tfstate"}}'
# Verify no apply is actually running before force-unlock
# Only force-unlock if you are certain no other process holds the lock
terraform force-unlock <LOCK_ID_FROM_ERROR>
testing
QA Expert for writing E2E tests, test scenarios, test plans, and ensuring test coverage quality.
development
Expert UI/UX design intelligence for creating distinctive, high-craft, and mobile-first interfaces. Focuses on premium aesthetics, touch-first ergonomics, and Flutter performance.
development
Code Review Expert for static analysis, security auditing, architecture review, and ensuring code quality standards.
development
Babysit a GitHub pull request after creation by continuously polling review comments, CI checks/workflow runs, and mergeability state until the PR is merged/closed or user help is required. Diagnose failures, retry likely flaky failures up to 3 times, auto-fix/push branch-related issues when appropriate, and keep watching open PRs so fresh review feedback is surfaced promptly. Use when the user asks Codex to monitor a PR, watch CI, handle review comments, or keep an eye on failures and feedback on an open PR.