Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

harsh040506/docker-kubernetes

Name: docker-kubernetes
Author: harsh040506

engineering/devops/skills/docker-kubernetes/SKILL.md

npx skillsauth add harsh040506/claude-code-unified-skill-plugin-library docker-kubernetes

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Docker & Kubernetes

Production-grade guidance for building container images and orchestrating workloads on Kubernetes.

Docker: Building Production Images

The Non-Negotiable Rules

Pin base image versions. Never use latest. Use node:20.11-alpine3.19 not node:latest. Unpinned tags change without warning and break reproducible builds.
Use distroless or Alpine for production. Smaller attack surface, faster pulls, less to patch. gcr.io/distroless/nodejs20-debian11 is a good default for Node.js.
Never run as root. Add a non-root user and switch to it before the final CMD:
```
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser
```
One process per container. Don't run nginx + app + cron in one container. Use separate containers and orchestrate with Kubernetes.
Use .dockerignore. Exclude node_modules/, .git/, *.log, **/*.test.ts, .env. A bloated build context slows everything.

Multi-Stage Builds (Required for Compiled Languages)

# Stage 1: Build
FROM node:20.11-alpine3.19 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build

# Stage 2: Runtime (no dev deps, no source)
FROM node:20.11-alpine3.19 AS runtime
WORKDIR /app
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
COPY --from=builder --chown=appuser:appgroup /app/dist ./dist
COPY --from=builder --chown=appuser:appgroup /app/node_modules ./node_modules
USER appuser
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
  CMD wget -qO- http://localhost:3000/health || exit 1
CMD ["node", "dist/server.js"]

Layer Cache Optimization

Order Dockerfile instructions from least-changed to most-changed:

Base image
System packages (apt-get, apk)
Dependency files (package.json, go.mod, requirements.txt)
Install dependencies
Source code (COPY . .)

This way, a source code change only invalidates layers 5+, not the expensive dependency install.

Image Security Scanning

Always scan images before pushing to production:

# Trivy (recommended — free, fast)
trivy image <image>:<tag>

# Docker Scout (built into Docker Desktop)
docker scout cves <image>:<tag>

Fail CI/CD pipelines on HIGH or CRITICAL vulnerabilities.

Kubernetes: Core Concepts

Deployment Manifest (Production-Ready Template)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-service
  namespace: production
  labels:
    app: api-service
    version: "1.2.3"
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api-service
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1    # At most 1 pod down at a time
      maxSurge: 1          # At most 1 extra pod during rollout
  template:
    metadata:
      labels:
        app: api-service
        version: "1.2.3"
    spec:
      # Always set a termination grace period
      terminationGracePeriodSeconds: 30
      
      # Security context — no root, read-only filesystem
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 2000
      
      containers:
      - name: api-service
        image: registry.example.com/api-service:1.2.3
        ports:
        - containerPort: 3000
        
        # Resource requests and limits — ALWAYS set both
        resources:
          requests:
            cpu: "100m"
            memory: "128Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"
        
        # Readiness: traffic only sent when ready
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 3000
          initialDelaySeconds: 5
          periodSeconds: 10
          failureThreshold: 3
        
        # Liveness: pod restarted if unhealthy
        livenessProbe:
          httpGet:
            path: /health/live
            port: 3000
          initialDelaySeconds: 15
          periodSeconds: 20
          failureThreshold: 3
        
        # Startup probe for slow-starting apps
        startupProbe:
          httpGet:
            path: /health/ready
            port: 3000
          failureThreshold: 30
          periodSeconds: 10
        
        # Env from ConfigMap and Secret — never hardcode
        envFrom:
        - configMapRef:
            name: api-service-config
        - secretRef:
            name: api-service-secrets
        
        # Read-only root filesystem (strong security posture)
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities:
            drop: ["ALL"]
        
        # Writable temp dir if needed
        volumeMounts:
        - name: tmp
          mountPath: /tmp
      
      volumes:
      - name: tmp
        emptyDir: {}
      
      # Avoid running all replicas on the same node
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values: ["api-service"]
              topologyKey: kubernetes.io/hostname

Resource Sizing Guidelines

| Service Type | CPU Request | CPU Limit | Mem Request | Mem Limit | |---|---|---|---|---| | Lightweight API | 50m | 250m | 64Mi | 256Mi | | Standard API | 100m | 500m | 128Mi | 512Mi | | CPU-intensive (ML inference) | 500m | 2000m | 512Mi | 2Gi | | Worker/queue consumer | 100m | 1000m | 256Mi | 1Gi |

Rule: Requests = what the scheduler guarantees. Limits = the hard ceiling. Set limits to 3–5× requests for burstable workloads.

Health Probes: The Right Endpoints

Readiness probe — "Am I ready to serve traffic?"

Returns 200 only when all dependencies are connected (DB, cache, downstream services)
Failure removes pod from Service endpoints (traffic stops flowing to it)
Check: GET /health/ready

Liveness probe — "Am I alive (not deadlocked)?"

Returns 200 as long as the process can respond
Failure causes the pod to be killed and restarted
Check: GET /health/live (simpler check than readiness)

Common mistake: Using the same endpoint for both. If your DB is down, liveness should still return 200 (the app is alive, just degraded). Readiness should return 503.

Namespaces & RBAC

Every workload gets its own namespace. Never use default for production workloads.

# Namespace per environment
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    env: production

Use ServiceAccounts with least-privilege RBAC. A pod should only have the permissions it actually needs:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: api-service-role
  namespace: production
rules:
- apiGroups: [""]
  resources: ["configmaps"]
  verbs: ["get", "watch", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: api-service-binding
  namespace: production
subjects:
- kind: ServiceAccount
  name: api-service-sa
  namespace: production
roleRef:
  kind: Role
  name: api-service-role
  apiGroup: rbac.authorization.k8s.io

Secrets Management

Never store secrets in plain YAML committed to git. Use:

External Secrets Operator + AWS Secrets Manager / GCP Secret Manager (recommended for cloud)
Sealed Secrets — encrypt secrets that can be safely committed
Vault Agent — HashiCorp Vault sidecar injection

# External Secrets Operator example
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: api-service-secrets
  namespace: production
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secretsmanager
    kind: ClusterSecretStore
  target:
    name: api-service-secrets
    creationPolicy: Owner
  data:
  - secretKey: DATABASE_URL
    remoteRef:
      key: production/api-service
      property: database_url

Helm Charts

Use Helm for reusable, parameterized manifests. Structure:

charts/
└── api-service/
    ├── Chart.yaml
    ├── values.yaml          # Default values
    ├── values-staging.yaml  # Staging overrides
    ├── values-prod.yaml     # Production overrides
    └── templates/
        ├── deployment.yaml
        ├── service.yaml
        ├── hpa.yaml
        ├── ingress.yaml
        └── _helpers.tpl

Always lint before deploying: helm lint charts/api-service Template rendering preview: helm template api-service charts/api-service -f values-prod.yaml

Common Failure Patterns

| Symptom | Likely Cause | Fix | |---------|-------------|-----| | CrashLoopBackOff | App crashes on startup | Check kubectl logs <pod> --previous | | OOMKilled | Memory limit too low | Increase limits.memory or fix memory leak | | ImagePullBackOff | Wrong image tag or registry auth | Verify image exists; check imagePullSecrets | | Pending | Insufficient node resources | Check kubectl describe pod for events | | High P99 latency | CPU throttling | Increase limits.cpu or scale horizontally | | Readiness failing | Dependency not ready | Check dependency health; add retry logic | | Pods evicted | Node under memory pressure | Check kubectl get events for eviction events |

Deeper Reference

For production-grade manifests and container hardening templates, see:

references/k8s-manifests.md — complete Deployment, Service, HPA, PDB, NetworkPolicy, and RBAC manifests ready to adapt
references/docker-patterns.md — multi-stage Dockerfiles, secrets handling, and container security hardening patterns

harsh040506/docker-kubernetes

engineering/devops/skills/docker-kubernetes/SKILL.md

This skill should be used when the user asks about "Dockerfile", "Docker image", "containerize", "docker build", "docker-compose", "Kubernetes", "k8s", "kubectl", "pod", "deployment", "service", "ingress", "namespace", "Helm chart", "ConfigMap", "Secret", "PersistentVolume", "RBAC", "resource limits", "liveness probe", "readiness probe", "pod scheduling", "node affinity", "taint", "toleration", "StatefulSet", "DaemonSet", "CronJob", "HPA", "VPA", or "cluster". Also trigger for "why is my pod crashing", "OOMKilled", "CrashLoopBackOff", "ImagePullBackOff", or "Pending" pod states.

2 stars

development

Updated Apr 5, 2026

$ install --global

skillsauth

npx skillsauth add harsh040506/claude-code-unified-skill-plugin-library docker-kubernetes

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 5, 2026, 5:10 PM3.8s3 files scanned

SKILL.md

name:: docker-kubernetes
description:: This skill should be used when the user asks about "Dockerfile", "Docker image", "containerize", "docker build", "docker-compose", "Kubernetes", "k8s", "kubectl", "pod", "deployment", "service", "ingress", "namespace", "Helm chart", "ConfigMap", "Secret", "PersistentVolume", "RBAC", "resource limits", "liveness probe", "readiness probe", "pod scheduling", "node affinity", "taint", "toleration", "StatefulSet", "DaemonSet", "CronJob", "HPA", "VPA", or "cluster". Also trigger for "why is my pod crashing", "OOMKilled", "CrashLoopBackOff", "ImagePullBackOff", or "Pending" pod states.

Docker & Kubernetes

Production-grade guidance for building container images and orchestrating workloads on Kubernetes.

Docker: Building Production Images

The Non-Negotiable Rules

Pin base image versions. Never use latest. Use node:20.11-alpine3.19 not node:latest. Unpinned tags change without warning and break reproducible builds.
Use distroless or Alpine for production. Smaller attack surface, faster pulls, less to patch. gcr.io/distroless/nodejs20-debian11 is a good default for Node.js.
Never run as root. Add a non-root user and switch to it before the final CMD:
```
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser
```
One process per container. Don't run nginx + app + cron in one container. Use separate containers and orchestrate with Kubernetes.
Use .dockerignore. Exclude node_modules/, .git/, *.log, **/*.test.ts, .env. A bloated build context slows everything.

Multi-Stage Builds (Required for Compiled Languages)

# Stage 1: Build
FROM node:20.11-alpine3.19 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build

# Stage 2: Runtime (no dev deps, no source)
FROM node:20.11-alpine3.19 AS runtime
WORKDIR /app
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
COPY --from=builder --chown=appuser:appgroup /app/dist ./dist
COPY --from=builder --chown=appuser:appgroup /app/node_modules ./node_modules
USER appuser
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
  CMD wget -qO- http://localhost:3000/health || exit 1
CMD ["node", "dist/server.js"]

Layer Cache Optimization

Order Dockerfile instructions from least-changed to most-changed:

Base image
System packages (apt-get, apk)
Dependency files (package.json, go.mod, requirements.txt)
Install dependencies
Source code (COPY . .)

This way, a source code change only invalidates layers 5+, not the expensive dependency install.

Image Security Scanning

Always scan images before pushing to production:

# Trivy (recommended — free, fast)
trivy image <image>:<tag>

# Docker Scout (built into Docker Desktop)
docker scout cves <image>:<tag>

Fail CI/CD pipelines on HIGH or CRITICAL vulnerabilities.

Kubernetes: Core Concepts

Deployment Manifest (Production-Ready Template)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-service
  namespace: production
  labels:
    app: api-service
    version: "1.2.3"
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api-service
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1    # At most 1 pod down at a time
      maxSurge: 1          # At most 1 extra pod during rollout
  template:
    metadata:
      labels:
        app: api-service
        version: "1.2.3"
    spec:
      # Always set a termination grace period
      terminationGracePeriodSeconds: 30
      
      # Security context — no root, read-only filesystem
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 2000
      
      containers:
      - name: api-service
        image: registry.example.com/api-service:1.2.3
        ports:
        - containerPort: 3000
        
        # Resource requests and limits — ALWAYS set both
        resources:
          requests:
            cpu: "100m"
            memory: "128Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"
        
        # Readiness: traffic only sent when ready
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 3000
          initialDelaySeconds: 5
          periodSeconds: 10
          failureThreshold: 3
        
        # Liveness: pod restarted if unhealthy
        livenessProbe:
          httpGet:
            path: /health/live
            port: 3000
          initialDelaySeconds: 15
          periodSeconds: 20
          failureThreshold: 3
        
        # Startup probe for slow-starting apps
        startupProbe:
          httpGet:
            path: /health/ready
            port: 3000
          failureThreshold: 30
          periodSeconds: 10
        
        # Env from ConfigMap and Secret — never hardcode
        envFrom:
        - configMapRef:
            name: api-service-config
        - secretRef:
            name: api-service-secrets
        
        # Read-only root filesystem (strong security posture)
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities:
            drop: ["ALL"]
        
        # Writable temp dir if needed
        volumeMounts:
        - name: tmp
          mountPath: /tmp
      
      volumes:
      - name: tmp
        emptyDir: {}
      
      # Avoid running all replicas on the same node
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values: ["api-service"]
              topologyKey: kubernetes.io/hostname

Resource Sizing Guidelines

Rule: Requests = what the scheduler guarantees. Limits = the hard ceiling. Set limits to 3–5× requests for burstable workloads.

Health Probes: The Right Endpoints

Readiness probe — "Am I ready to serve traffic?"

Returns 200 only when all dependencies are connected (DB, cache, downstream services)
Failure removes pod from Service endpoints (traffic stops flowing to it)
Check: GET /health/ready

Liveness probe — "Am I alive (not deadlocked)?"

Returns 200 as long as the process can respond
Failure causes the pod to be killed and restarted
Check: GET /health/live (simpler check than readiness)

Common mistake: Using the same endpoint for both. If your DB is down, liveness should still return 200 (the app is alive, just degraded). Readiness should return 503.

Namespaces & RBAC

Every workload gets its own namespace. Never use default for production workloads.

# Namespace per environment
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    env: production

Use ServiceAccounts with least-privilege RBAC. A pod should only have the permissions it actually needs:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: api-service-role
  namespace: production
rules:
- apiGroups: [""]
  resources: ["configmaps"]
  verbs: ["get", "watch", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: api-service-binding
  namespace: production
subjects:
- kind: ServiceAccount
  name: api-service-sa
  namespace: production
roleRef:
  kind: Role
  name: api-service-role
  apiGroup: rbac.authorization.k8s.io

Secrets Management

Never store secrets in plain YAML committed to git. Use:

External Secrets Operator + AWS Secrets Manager / GCP Secret Manager (recommended for cloud)
Sealed Secrets — encrypt secrets that can be safely committed
Vault Agent — HashiCorp Vault sidecar injection

# External Secrets Operator example
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: api-service-secrets
  namespace: production
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secretsmanager
    kind: ClusterSecretStore
  target:
    name: api-service-secrets
    creationPolicy: Owner
  data:
  - secretKey: DATABASE_URL
    remoteRef:
      key: production/api-service
      property: database_url

Helm Charts

Use Helm for reusable, parameterized manifests. Structure:

charts/
└── api-service/
    ├── Chart.yaml
    ├── values.yaml          # Default values
    ├── values-staging.yaml  # Staging overrides
    ├── values-prod.yaml     # Production overrides
    └── templates/
        ├── deployment.yaml
        ├── service.yaml
        ├── hpa.yaml
        ├── ingress.yaml
        └── _helpers.tpl

Always lint before deploying: helm lint charts/api-service Template rendering preview: helm template api-service charts/api-service -f values-prod.yaml

Common Failure Patterns

Deeper Reference

For production-grade manifests and container hardening templates, see:

references/k8s-manifests.md — complete Deployment, Service, HPA, PDB, NetworkPolicy, and RBAC manifests ready to adapt
references/docker-patterns.md — multi-stage Dockerfiles, secrets handling, and container security hardening patterns

Related Skills

harsh040506/single-cell-rna-qc

testing

VerifiedTrustedCommunity

Performs quality control on single-cell RNA-seq data (.h5ad or .h5 files) using scverse best practices with MAD-based filtering and comprehensive visualizations. Use when users request QC analysis, filtering low-quality cells, assessing data quality, or following scverse/scanpy best practices for single-cell analysis.

2SKILL.mdUpdated Apr 5, 2026

harsh040506/single-cell-rna-qc

harsh040506/scvi-tools

tools

VerifiedTrustedCommunity

Deep learning for single-cell analysis using scvi-tools. This skill should be used when users need (1) data integration and batch correction with scVI/scANVI, (2) ATAC-seq analysis with PeakVI, (3) CITE-seq multi-modal analysis with totalVI, (4) multiome RNA+ATAC analysis with MultiVI, (5) spatial transcriptomics deconvolution with DestVI, (6) label transfer and reference mapping with scANVI/scArches, (7) RNA velocity with veloVI, or (8) any deep learning-based single-cell method. Triggers include mentions of scVI, scANVI, totalVI, PeakVI, MultiVI, DestVI, veloVI, sysVI, scArches, variational autoencoder, VAE, batch correction, data integration, multi-modal, CITE-seq, multiome, reference mapping, latent space.

2SKILL.mdUpdated Apr 5, 2026

harsh040506/scvi-tools

harsh040506/scientific-problem-selection

testing

VerifiedTrustedCommunity

This skill should be used when scientists need help with research problem selection, project ideation, troubleshooting stuck projects, or strategic scientific decisions. Use this skill when users ask to pitch a new research idea, work through a project problem, evaluate project risks, plan research strategy, navigate decision trees, or get help choosing what scientific problem to work on. Typical requests include "I have an idea for a project", "I'm stuck on my research", "help me evaluate this project", "what should I work on", or "I need strategic advice about my research".

2SKILL.mdUpdated Apr 5, 2026

harsh040506/scientific-problem-selection

harsh040506/nextflow-development

development

VerifiedTrustedCommunity

Run nf-core bioinformatics pipelines (rnaseq, sarek, atacseq) on sequencing data. Use when analyzing RNA-seq, WGS/WES, or ATAC-seq data—either local FASTQs or public datasets from GEO/SRA. Triggers on nf-core, Nextflow, FASTQ analysis, variant calling, gene expression, differential expression, GEO reanalysis, GSE/GSM/SRR accessions, or samplesheet creation.

2SKILL.mdUpdated Apr 5, 2026

harsh040506/nextflow-development

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/harsh040506/claude-code-unified-skill-plugin-library.git

# Copy into Claude Code skills folder (global)
cp -r claude-code-unified-skill-plugin-library/engineering/devops/skills/docker-kubernetes ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

harsh040506/claude-code-unified-skill-plugin-library

2 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT