devops/containers/docker-management/SKILL.md
Build, optimize, and troubleshoot Docker containers and images. Create efficient Dockerfiles, manage container lifecycle, configure networking and volumes, and debug container issues. Use when working with Docker, containerization, or container troubleshooting.
npx skillsauth add bagelhole/devops-security-agent-skills docker-managementInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Build, run, and manage Docker containers for application deployment and development.
Use this skill when:
# Build stage
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build
# Production stage
FROM node:20-alpine AS production
WORKDIR /app
RUN addgroup -g 1001 -S nodejs && \
adduser -S nodejs -u 1001
COPY --from=builder --chown=nodejs:nodejs /app/dist ./dist
COPY --from=builder --chown=nodejs:nodejs /app/node_modules ./node_modules
USER nodejs
EXPOSE 3000
CMD ["node", "dist/index.js"]
FROM python:3.12-slim
# Install dependencies first (cached unless requirements change)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code (changes frequently)
COPY . .
CMD ["python", "app.py"]
FROM node:20-alpine
# Create non-root user
RUN addgroup -g 1001 appgroup && \
adduser -u 1001 -G appgroup -D appuser
WORKDIR /app
# Copy with proper ownership
COPY --chown=appuser:appgroup . .
# Drop privileges
USER appuser
# Use exec form for proper signal handling
CMD ["node", "server.js"]
# Build with tag
docker build -t myapp:1.0 .
# Build with build args
docker build --build-arg NODE_ENV=production -t myapp:prod .
# Build for specific platform
docker build --platform linux/amd64 -t myapp:amd64 .
# Build with no cache
docker build --no-cache -t myapp:fresh .
# Create builder
docker buildx create --name multiplatform --use
# Build for multiple architectures
docker buildx build \
--platform linux/amd64,linux/arm64 \
-t myregistry/myapp:latest \
--push .
# Run container
docker run -d --name myapp -p 8080:3000 myapp:latest
# Run with environment variables
docker run -d \
-e DATABASE_URL=postgres://localhost/db \
-e NODE_ENV=production \
myapp:latest
# Run with resource limits
docker run -d \
--memory="512m" \
--cpus="1.0" \
myapp:latest
# Run with restart policy
docker run -d --restart=unless-stopped myapp:latest
# Named volume
docker volume create mydata
docker run -v mydata:/app/data myapp:latest
# Bind mount
docker run -v $(pwd)/config:/app/config:ro myapp:latest
# tmpfs mount (memory)
docker run --tmpfs /tmp:rw,noexec,nosuid myapp:latest
# Create network
docker network create mynetwork
# Run on network
docker run -d --network mynetwork --name api myapp:latest
# Connect existing container
docker network connect mynetwork existing-container
# Expose specific ports
docker run -d -p 127.0.0.1:8080:3000 myapp:latest
# List containers
docker ps -a
# Stop container
docker stop myapp
# Remove container
docker rm myapp
# Force remove running container
docker rm -f myapp
# Prune stopped containers
docker container prune -f
# View logs
docker logs myapp
# Follow logs
docker logs -f --tail 100 myapp
# View resource usage
docker stats myapp
# Inspect container
docker inspect myapp
# Execute command in running container
docker exec -it myapp /bin/sh
# Run container with shell
docker run -it --rm myapp:latest /bin/sh
# Debug failed container
docker run -it --entrypoint /bin/sh myapp:latest
# Check container logs for errors
docker logs myapp 2>&1 | grep -i error
# Inspect container state
docker inspect --format='{{.State.Status}}' myapp
# Check container processes
docker top myapp
# View container filesystem changes
docker diff myapp
# Export container filesystem
docker export myapp > myapp-fs.tar
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:3000/health || exit 1
# Check health status
docker inspect --format='{{.State.Health.Status}}' myapp
# Tag image
docker tag myapp:latest myregistry.com/myapp:v1.0
# Push to registry
docker push myregistry.com/myapp:v1.0
# Pull image
docker pull myregistry.com/myapp:v1.0
# Remove unused images
docker image prune -a
# Remove all unused resources
docker system prune -a --volumes
# Remove specific image
docker rmi myapp:old
# List image sizes
docker images --format "table {{.Repository}}\t{{.Tag}}\t{{.Size}}"
# View image history
docker history myapp:latest
# Inspect image layers
docker inspect myapp:latest
# Check image vulnerabilities (with Docker Scout)
docker scout cves myapp:latest
# docker-compose.yml
version: '3.8'
services:
app:
build:
context: .
dockerfile: Dockerfile
ports:
- "3000:3000"
environment:
- NODE_ENV=production
volumes:
- app-data:/app/data
depends_on:
- db
restart: unless-stopped
db:
image: postgres:15-alpine
environment:
POSTGRES_PASSWORD: secret
volumes:
- db-data:/var/lib/postgresql/data
volumes:
app-data:
db-data:
# Use specific version tags
FROM node:20.10-alpine3.18
# Don't run as root
USER nobody
# Remove unnecessary packages
RUN apk del --purge build-dependencies
# Use COPY instead of ADD
COPY . .
# Run with security options
docker run -d \
--security-opt=no-new-privileges \
--cap-drop=ALL \
--cap-add=NET_BIND_SERVICE \
--read-only \
myapp:latest
# Use user namespace remapping
# Add to /etc/docker/daemon.json: {"userns-remap": "default"}
Problem: Container starts and stops instantly
Solution: Check if CMD/ENTRYPOINT runs foreground process, use docker logs to see errors
Problem: Port not accessible Solution: Verify port mapping (-p), check container is running, verify firewall rules
Problem: Docker using too much disk
Solution: Run docker system prune -a --volumes, check for large unused images
Problem: Every build downloads dependencies Solution: Order Dockerfile instructions from least to most frequently changing
development
Design and operationalize SRE dashboards that surface reliability, latency, error, saturation, and capacity signals across services. Use when building observability views for SLOs, incident response, and executive reliability reporting.
testing
Harden OpenClaw self-hosted environments with baseline host controls, auth tightening, secret handling, network segmentation, and safe update/rollback workflows. Use when deploying OpenClaw in home labs, startups, or production-like local AI infrastructure.
devops
Deploy, manage, and optimize vector databases for AI applications. Covers Qdrant, Weaviate, pgvector, and Pinecone — collection management, indexing strategies, backup, and performance tuning for production RAG and semantic search workloads.
testing
Deploy ML models on Kubernetes with KServe (formerly KFServing) and NVIDIA Triton Inference Server. Includes canary deployments, autoscaling, model versioning, A/B testing, and GPU resource management for production model serving.