engineering/devops/skills/infrastructure-as-code/SKILL.md
This skill should be used when the user asks about "Terraform", "Pulumi", "CloudFormation", "CDK", "infrastructure as code", "IaC", "provision infrastructure", "Terraform module", "Terraform state", "remote state", "state locking", "terraform plan", "terraform apply", "terraform destroy", "drift detection", "resource import", "data source", "HCL", "AWS provider", "GCP provider", "Azure provider", "secrets in Terraform", "Vault", "environment variables in IaC", "workspace", "backend", or "infrastructure automation". Also trigger for "how do I manage cloud resources", "way to deploy AWS resources without clicking", or "automate infrastructure".
npx skillsauth add harsh040506/claude-code-unified-skill-plugin-library infrastructure-as-codeInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Provision, manage, and automate cloud infrastructure reproducibly and safely with Terraform (primary), Pulumi, and AWS CDK.
Write HCL → terraform init → terraform plan → Review → terraform apply
Never skip terraform plan. Always review what will be created, modified, or destroyed before applying.
Color coding in plan output:
+ green = create~ yellow = update in-place (safe)-/+ red/green = destroy and recreate (risk of downtime)- red = destroy (permanent)infrastructure/
├── environments/
│ ├── staging/
│ │ ├── main.tf # Resources specific to staging
│ │ ├── variables.tf # Input variables
│ │ ├── outputs.tf # Exposed outputs
│ │ ├── terraform.tfvars # Non-secret variable values
│ │ └── backend.tf # Remote state config
│ └── production/
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
│ ├── terraform.tfvars
│ └── backend.tf
└── modules/
├── networking/ # VPC, subnets, security groups
├── compute/ # ECS, EC2, Lambda
├── database/ # RDS, DynamoDB
└── observability/ # CloudWatch, Datadog
One state file per environment. Never share state between staging and production.
# backend.tf — store state in S3 with DynamoDB locking
terraform {
backend "s3" {
bucket = "mycompany-terraform-state"
key = "production/api-service/terraform.tfstate"
region = "us-east-1"
encrypt = true # Encrypt state at rest
dynamodb_table = "terraform-state-lock" # Prevent concurrent applies
}
}
Create the S3 bucket and DynamoDB table before using this backend. Use versioning on the S3 bucket to recover from accidental state corruption.
Write modules for reusable infrastructure patterns:
# modules/ecs-service/main.tf
variable "service_name" {
type = string
description = "Name of the ECS service"
}
variable "image" {
type = string
description = "Docker image with tag: registry/image:tag"
}
variable "cpu" {
type = number
default = 256
description = "CPU units (1024 = 1 vCPU)"
}
variable "memory" {
type = number
default = 512
description = "Memory in MiB"
}
variable "desired_count" {
type = number
default = 2
description = "Number of tasks to run"
}
resource "aws_ecs_service" "this" {
name = var.service_name
cluster = var.cluster_arn
task_definition = aws_ecs_task_definition.this.arn
desired_count = var.desired_count
launch_type = "FARGATE"
network_configuration {
subnets = var.private_subnet_ids
security_groups = [aws_security_group.service.id]
assign_public_ip = false
}
load_balancer {
target_group_arn = var.target_group_arn
container_name = var.service_name
container_port = var.container_port
}
lifecycle {
ignore_changes = [desired_count] # Managed by auto-scaling
}
}
output "service_name" {
value = aws_ecs_service.this.name
}
output "service_arn" {
value = aws_ecs_service.this.id
}
Module best practices:
lifecycle { ignore_changes = [...] } for fields managed by external systemssource = "git::https://github.com/myorg/tf-modules.git//ecs-service?ref=v1.2.0"NEVER put secrets in .tfvars files or commit them to git.
# Read a secret — Terraform doesn't store the value in state
data "aws_secretsmanager_secret_version" "db_password" {
secret_id = "production/api-service/db-password"
}
resource "aws_ecs_task_definition" "api" {
# ...
container_definitions = jsonencode([{
name = "api-service"
secrets = [{
name = "DATABASE_PASSWORD"
valueFrom = data.aws_secretsmanager_secret_version.db_password.arn
}]
}])
}
Pass secrets as environment variables during CI/CD runs — never write them to any file:
export TF_VAR_db_password="$(aws secretsmanager get-secret-value --secret-id prod/db --query SecretString --output text)"
terraform apply
provider "vault" {
address = "https://vault.example.com"
# Auth via environment or OIDC — no hardcoded token
}
data "vault_generic_secret" "db" {
path = "secret/production/database"
}
When infrastructure already exists and you want Terraform to manage it:
# Traditional import (Terraform < 1.5)
terraform import aws_s3_bucket.my_bucket my-existing-bucket-name
# Generate import block (Terraform >= 1.5 — preferred)
terraform plan -generate-config-out=generated.tf
# Remove a resource from state without destroying it
terraform state rm aws_s3_bucket.old_bucket
# Move a resource (rename in refactor)
terraform state mv aws_instance.old_name aws_instance.new_name
# List all resources in state
terraform state list
# Show current state of a specific resource
terraform state show aws_s3_bucket.my_bucket
Always take a state backup before manipulation:
terraform state pull > backup-$(date +%Y%m%d-%H%M%S).tfstate
Detect when real infrastructure has diverged from Terraform state:
# Shows what would change if you applied (also shows drift)
terraform plan
# Refresh state to match real infrastructure
terraform apply -refresh-only
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.5.0"
name = "production-vpc"
cidr = "10.0.0.0/16"
azs = ["us-east-1a", "us-east-1b", "us-east-1c"]
private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
enable_nat_gateway = true
single_nat_gateway = false # One per AZ for HA
enable_dns_hostnames = true
enable_dns_support = true
tags = local.common_tags
}
resource "aws_db_instance" "postgres" {
identifier = "production-postgres"
engine = "postgres"
engine_version = "16.1"
instance_class = "db.t3.medium"
allocated_storage = 100
max_allocated_storage = 500 # Enable auto-scaling storage
db_name = "appdb"
username = "appuser"
password = var.db_password # From secrets manager, not hardcoded
multi_az = true # Automatic failover
storage_encrypted = true # Encrypt at rest
kms_key_id = aws_kms_key.rds.arn
backup_retention_period = 7 # 7 days of backups
backup_window = "03:00-04:00"
maintenance_window = "Mon:04:00-Mon:05:00"
deletion_protection = true # Prevent accidental destroy
skip_final_snapshot = false # Take snapshot before destroy
final_snapshot_identifier = "production-postgres-final-$(timestamp())"
vpc_security_group_ids = [aws_security_group.rds.id]
db_subnet_group_name = aws_db_subnet_group.main.name
tags = local.common_tags
}
plan before apply. Never terraform apply -auto-approve in production.prevent_destroy lifecycle for critical resources:
lifecycle {
prevent_destroy = true
}
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.30" # Allow patch updates, not major
}
}
required_version = ">= 1.6.0"
}
terraform apply can take down production.For complete IaC module templates and idempotent automation recipes, see:
references/terraform-patterns.md — reusable Terraform modules for VPC, ECS, RDS, and IAM with remote state configurationreferences/ansible-patterns.md — production Ansible playbooks for server provisioning, application deployment, and secret managementtesting
Performs quality control on single-cell RNA-seq data (.h5ad or .h5 files) using scverse best practices with MAD-based filtering and comprehensive visualizations. Use when users request QC analysis, filtering low-quality cells, assessing data quality, or following scverse/scanpy best practices for single-cell analysis.
tools
Deep learning for single-cell analysis using scvi-tools. This skill should be used when users need (1) data integration and batch correction with scVI/scANVI, (2) ATAC-seq analysis with PeakVI, (3) CITE-seq multi-modal analysis with totalVI, (4) multiome RNA+ATAC analysis with MultiVI, (5) spatial transcriptomics deconvolution with DestVI, (6) label transfer and reference mapping with scANVI/scArches, (7) RNA velocity with veloVI, or (8) any deep learning-based single-cell method. Triggers include mentions of scVI, scANVI, totalVI, PeakVI, MultiVI, DestVI, veloVI, sysVI, scArches, variational autoencoder, VAE, batch correction, data integration, multi-modal, CITE-seq, multiome, reference mapping, latent space.
testing
This skill should be used when scientists need help with research problem selection, project ideation, troubleshooting stuck projects, or strategic scientific decisions. Use this skill when users ask to pitch a new research idea, work through a project problem, evaluate project risks, plan research strategy, navigate decision trees, or get help choosing what scientific problem to work on. Typical requests include "I have an idea for a project", "I'm stuck on my research", "help me evaluate this project", "what should I work on", or "I need strategic advice about my research".
development
Run nf-core bioinformatics pipelines (rnaseq, sarek, atacseq) on sequencing data. Use when analyzing RNA-seq, WGS/WES, or ATAC-seq data—either local FASTQs or public datasets from GEO/SRA. Triggers on nf-core, Nextflow, FASTQ analysis, variant calling, gene expression, differential expression, GEO reanalysis, GSE/GSM/SRR accessions, or samplesheet creation.