plugins/docker-master/skills/docker-best-practices/SKILL.md
Comprehensive Docker best practices for images, containers, and production deployments. PROACTIVELY activate for: (1) writing production-ready Dockerfiles, (2) multi-stage builds for lean images, (3) base image selection (distroless, alpine, scratch, slim), (4) layer caching and ordering, (5) .dockerignore hygiene, (6) HEALTHCHECK and graceful shutdown, (7) non-root user and least-privilege patterns, (8) ENV vs ARG and secret handling, (9) tag/digest pinning and reproducibility, (10) Compose for local development. Provides: Dockerfile templates by language, image-size optimization checklist, .dockerignore template, HEALTHCHECK patterns, and Compose dev-stack examples.
npx skillsauth add JosiahSiegel/claude-plugin-marketplace docker-best-practicesInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
MANDATORY: Always Use Backslashes on Windows for File Paths
When using Edit or Write tools on Windows, you MUST use backslashes (\) in file paths, NOT forward slashes (/).
Examples:
D:/repos/project/file.tsxD:\repos\project\file.tsxThis applies to:
NEVER create new documentation files unless explicitly requested by the user.
This skill provides current Docker best practices across all aspects of container development, deployment, and operation.
2025 Recommended Hierarchy:
cgr.dev/chainguard/*) - Zero-CVE goal, SBOM includedalpine:3.19) - ~7MB, minimal attack surfacegcr.io/distroless/*) - ~2MB, no shellnode:20-slim) - ~70MB, balancedKey rules:
node:20.11.0-alpine3.19latest (unpredictable, breaks reproducibility)Optimal layer ordering (least to most frequently changing):
1. Base image and system dependencies
2. Application dependencies (package.json, requirements.txt, etc.)
3. Application code
4. Configuration and metadata
Rationale: Docker caches layers. If code changes but dependencies don't, cached dependency layers are reused, speeding up builds.
Example:
FROM python:3.12-slim
# 1. System packages (rarely change)
RUN apt-get update && apt-get install -y --no-install-recommends \
gcc \
&& rm -rf /var/lib/apt/lists/*
# 2. Dependencies (change occasionally)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# 3. Application code (changes frequently)
COPY . /app
WORKDIR /app
CMD ["python", "app.py"]
Use multi-stage builds to separate build dependencies from runtime:
# Build stage
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
# Production stage
FROM node:20-alpine AS runtime
WORKDIR /app
# Only copy what's needed for runtime
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
USER node
CMD ["node", "dist/server.js"]
Benefits:
Combine commands to reduce layers and image size:
# Bad - 3 layers, cleanup doesn't reduce size
RUN apt-get update
RUN apt-get install -y curl
RUN rm -rf /var/lib/apt/lists/*
# Good - 1 layer, cleanup effective
RUN apt-get update && \
apt-get install -y --no-install-recommends curl && \
rm -rf /var/lib/apt/lists/*
Always create .dockerignore to exclude unnecessary files:
# Version control
.git
.gitignore
# Dependencies
node_modules
__pycache__
*.pyc
# IDE
.vscode
.idea
# OS
.DS_Store
Thumbs.db
# Logs
*.log
logs/
# Testing
coverage/
.nyc_output
*.test.js
# Documentation
README.md
docs/
# Environment
.env
.env.local
*.local
docker run \
# Run as non-root
--user 1000:1000 \
# Drop all capabilities, add only needed ones
--cap-drop=ALL \
--cap-add=NET_BIND_SERVICE \
# Read-only filesystem
--read-only \
# Temporary writable filesystems
--tmpfs /tmp:noexec,nosuid \
# No new privileges
--security-opt="no-new-privileges:true" \
# Resource limits
--memory="512m" \
--cpus="1.0" \
my-image
Always set resource limits in production:
# docker-compose.yml
services:
app:
deploy:
resources:
limits:
cpus: '2.0'
memory: 1G
reservations:
cpus: '1.0'
memory: 512M
Implement health checks for all long-running containers:
HEALTHCHECK --interval=30s --timeout=3s --retries=3 --start-period=40s \
CMD curl -f http://localhost:3000/health || exit 1
Or in compose:
services:
app:
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost/health"]
interval: 30s
timeout: 3s
retries: 3
start_period: 40s
Configure proper logging to prevent disk fill-up:
services:
app:
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
Or system-wide in /etc/docker/daemon.json:
{
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
}
}
services:
app:
# For development
restart: "no"
# For production
restart: unless-stopped
# Or with fine-grained control (Swarm mode)
deploy:
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
window: 120s
# No version field needed (Compose v2.40.3+)
services:
# Service definitions
web:
# ...
api:
# ...
database:
# ...
networks:
# Custom networks (preferred)
frontend:
backend:
internal: true
volumes:
# Named volumes (preferred for persistence)
db-data:
app-data:
configs:
# Configuration files (Swarm mode)
app-config:
file: ./config/app.conf
secrets:
# Secrets (Swarm mode)
db-password:
file: ./secrets/db_pass.txt
networks:
frontend:
driver: bridge
backend:
driver: bridge
internal: true # No external access
services:
web:
networks:
- frontend
api:
networks:
- frontend
- backend
database:
networks:
- backend # Not accessible from frontend
services:
app:
# Load from file (preferred for non-secrets)
env_file:
- .env
# Inline for service-specific vars
environment:
- NODE_ENV=production
- LOG_LEVEL=info
# For Swarm mode secrets
secrets:
- db_password
Important:
.env to .gitignore.env.example as templateservices:
api:
depends_on:
database:
condition: service_healthy # Wait for health check
redis:
condition: service_started # Just wait for start
# Use semantic versioning
my-app:1.2.3
my-app:1.2
my-app:1
my-app:latest
# Include git commit for traceability
my-app:1.2.3-abc123f
# Environment tags
my-app:1.2.3-production
my-app:1.2.3-staging
Never do this:
# BAD - secret in layer history
ENV API_KEY=secret123
RUN echo "password" > /app/config
Do this:
# Use Docker secrets (Swarm) or external secret management
docker secret create db_password ./password.txt
# Or mount secrets at runtime
docker run -v /secure/secrets:/run/secrets:ro my-app
# Or use environment files (not in image)
docker run --env-file /secure/.env my-app
services:
app:
# Health checks
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost/health"]
interval: 30s
# Labels for monitoring tools
labels:
- "prometheus.io/scrape=true"
- "prometheus.io/port=9090"
- "com.company.team=backend"
- "com.company.version=1.2.3"
# Logging
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
# Backup named volume
docker run --rm \
-v VOLUME_NAME:/data \
-v $(pwd):/backup \
alpine tar czf /backup/backup-$(date +%Y%m%d).tar.gz -C /data .
# Restore volume
docker run --rm \
-v VOLUME_NAME:/data \
-v $(pwd):/backup \
alpine tar xzf /backup/backup.tar.gz -C /data
services:
app:
# For Swarm mode - rolling updates
deploy:
replicas: 3
update_config:
parallelism: 1 # Update 1 at a time
delay: 10s # Wait 10s between updates
failure_action: rollback
monitor: 60s
rollback_config:
parallelism: 1
delay: 5s
// /etc/docker/daemon.json
{
"userns-remap": "default",
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
},
"storage-driver": "overlay2",
"live-restore": true
}
:delegated or :cached for bind mounts# Better volume performance on macOS
volumes:
- ./src:/app/src:delegated # Host writes are delayed
- ./build:/app/build:cached # Container writes are cached
# Windows-compatible paths
volumes:
- C:/Users/name/app:/app # Forward slashes work
# or
- C:\Users\name\app:/app # Backslashes need escaping in YAML
# Use BuildKit (faster, better caching)
export DOCKER_BUILDKIT=1
# Use cache mounts
RUN --mount=type=cache,target=/root/.cache/pip \
pip install -r requirements.txt
# Use bind mounts for dependencies
RUN --mount=type=bind,source=package.json,target=package.json \
--mount=type=bind,source=package-lock.json,target=package-lock.json \
--mount=type=cache,target=/root/.npm \
npm ci
# Install and cleanup in one layer
RUN apt-get update && \
apt-get install -y --no-install-recommends \
package1 \
package2 && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Use exec form (no shell overhead)
CMD ["node", "server.js"] # Good
# vs
CMD node server.js # Bad - spawns shell
# Optimize signals
STOPSIGNAL SIGTERM
# Run as non-root (slightly faster, much more secure)
USER appuser
Image Security:
Runtime Security:
Compliance:
❌ Don't:
--privilegedlatest tag✅ Do:
This skill represents current Docker best practices. Always verify against official documentation for the latest recommendations, as Docker evolves continuously.
development
This skill should be used when the user asks to train, debug, scale, or improve ML models. PROACTIVELY activate for: (1) PyTorch, TensorFlow/Keras, JAX, Flax, Hugging Face Trainer/Accelerate training loops, (2) distributed training, DDP/FSDP/DeepSpeed, TPU/GPU setup, (3) mixed precision AMP/bf16, gradient accumulation, checkpointing, seeding, (4) overfitting, imbalance, loss functions, regularization, LR schedules, warmup, (5) memory optimization, gradient checkpointing, offloading, quantization-aware training. Provides: reproducible training best practices across deep learning and classical ML.
development
This skill should be used when the user asks to productionize, track, version, govern, monitor, or automate ML systems. PROACTIVELY activate for: (1) MLflow, Weights & Biases, Neptune, Comet, ClearML experiment tracking, (2) model registry, model versioning, artifact lineage, reproducibility, (3) Kubeflow, SageMaker Pipelines, Vertex AI Pipelines, Azure ML pipelines, Databricks workflows, (4) CI/CD, continuous training/evaluation, A/B tests, canary/shadow deployments, (5) drift detection, model monitoring, data validation, responsible AI governance. Provides: end-to-end MLOps architecture and operational safeguards.
development
This skill should be used when the user asks to optimize, export, serve, compress, or accelerate ML inference. PROACTIVELY activate for: (1) latency, throughput, p95/p99, batching, concurrency, KV cache, memory, or cost issues, (2) quantization INT8/INT4, GPTQ, AWQ, bitsandbytes, pruning, sparsity, distillation, (3) ONNX export, ONNX Runtime, TensorRT, TorchScript, torch.compile, XLA, OpenVINO, Core ML, TFLite, (4) Triton, TorchServe, TF Serving, BentoML, Seldon, KServe configuration, (5) edge deployment, CPU/GPU/TPU/Inferentia serving. Provides: hardware-aware inference optimization and safe benchmarking.
testing
This skill should be used when the user asks to tune hyperparameters, run sweeps, optimize search spaces, or use AutoML. PROACTIVELY activate for: (1) Optuna, Ray Tune, FLAML, AutoGluon, Hyperopt, Nevergrad, KerasTuner, W&B sweeps, (2) grid search, random search, Bayesian optimization, TPE, Gaussian processes, evolutionary search, (3) ASHA, Hyperband, successive halving, multi-fidelity optimization, population-based training, (4) learning-rate finder, batch-size search, early stopping, pruning, (5) reproducible sweep design and experiment analysis. Provides: budget-aware hyperparameter search strategy.