openclaw/hermes-agent/optional-skills/devops/docker-management/SKILL.md
Manage Docker containers, images, volumes, networks, and Compose stacks — lifecycle ops, debugging, cleanup, and Dockerfile optimization.
npx skillsauth add adminlove520/xiaoxi-skills docker-managementInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Manage Docker containers, images, volumes, networks, and Compose stacks using standard Docker CLI commands. No additional dependencies beyond Docker itself.
docker group (or use sudo)Quick check:
docker --version && docker compose version
| Task | Command |
|------|---------|
| Run container (background) | docker run -d --name NAME IMAGE |
| Stop + remove | docker stop NAME && docker rm NAME |
| View logs (follow) | docker logs --tail 50 -f NAME |
| Shell into container | docker exec -it NAME /bin/sh |
| List all containers | docker ps -a |
| Build image | docker build -t TAG . |
| Compose up | docker compose up -d |
| Compose down | docker compose down |
| Disk usage | docker system df |
| Cleanup dangling | docker image prune && docker container prune |
Figure out which area the request falls into:
Run a new container:
# Detached service with port mapping
docker run -d --name web -p 8080:80 nginx
# With environment variables
docker run -d -e POSTGRES_PASSWORD=secret -e POSTGRES_DB=mydb --name db postgres:16
# With persistent data (named volume)
docker run -d -v pgdata:/var/lib/postgresql/data --name db postgres:16
# For development (bind mount source code)
docker run -d -v $(pwd)/src:/app/src -p 3000:3000 --name dev my-app
# Interactive debugging (auto-remove on exit)
docker run -it --rm ubuntu:22.04 /bin/bash
# With resource limits and restart policy
docker run -d --memory=512m --cpus=1.5 --restart=unless-stopped --name app my-app
Key flags: -d detached, -it interactive+tty, --rm auto-remove, -p port (host:container), -e env var, -v volume, --name name, --restart restart policy.
Manage running containers:
docker ps # running containers
docker ps -a # all (including stopped)
docker stop NAME # graceful stop
docker start NAME # start stopped container
docker restart NAME # stop + start
docker rm NAME # remove stopped container
docker rm -f NAME # force remove running container
docker container prune # remove ALL stopped containers
Interact with containers:
docker exec -it NAME /bin/sh # shell access (use /bin/bash if available)
docker exec NAME env # view environment variables
docker exec -u root NAME apt update # run as specific user
docker logs --tail 100 -f NAME # follow last 100 lines
docker logs --since 2h NAME # logs from last 2 hours
docker cp NAME:/path/file ./local # copy file from container
docker cp ./file NAME:/path/ # copy file to container
docker inspect NAME # full container details (JSON)
docker stats --no-stream # resource usage snapshot
docker top NAME # running processes
# Build
docker build -t my-app:latest .
docker build -t my-app:prod -f Dockerfile.prod .
docker build --no-cache -t my-app . # clean rebuild
DOCKER_BUILDKIT=1 docker build -t my-app . # faster with BuildKit
# Pull and push
docker pull node:20-alpine
docker login ghcr.io
docker tag my-app:latest registry/my-app:v1.0
docker push registry/my-app:v1.0
# Inspect
docker images # list local images
docker history IMAGE # see layers
docker inspect IMAGE # full details
# Cleanup
docker image prune # remove dangling (untagged) images
docker image prune -a # remove ALL unused images (careful!)
docker image prune -a --filter "until=168h" # unused images older than 7 days
# Start/stop
docker compose up -d # start all services detached
docker compose up -d --build # rebuild images before starting
docker compose down # stop and remove containers
docker compose down -v # also remove volumes (DESTROYS DATA)
# Monitoring
docker compose ps # list services
docker compose logs -f api # follow logs for specific service
docker compose logs --tail 50 # last 50 lines all services
# Interaction
docker compose exec api /bin/sh # shell into running service
docker compose run --rm api npm test # one-off command (new container)
docker compose restart api # restart specific service
# Validation
docker compose config # validate and view resolved config
Minimal compose.yml example:
services:
api:
build: .
ports:
- "3000:3000"
environment:
- DATABASE_URL=postgres://user:pass@db:5432/mydb
depends_on:
db:
condition: service_healthy
db:
image: postgres:16-alpine
environment:
POSTGRES_USER: user
POSTGRES_PASSWORD: pass
POSTGRES_DB: mydb
volumes:
- pgdata:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U user"]
interval: 10s
timeout: 5s
retries: 5
volumes:
pgdata:
# Volumes
docker volume ls # list volumes
docker volume create mydata # create named volume
docker volume inspect mydata # details (mount point, etc.)
docker volume rm mydata # remove (fails if in use)
docker volume prune # remove unused volumes
# Networks
docker network ls # list networks
docker network create mynet # create bridge network
docker network inspect mynet # details (connected containers)
docker network connect mynet NAME # attach container to network
docker network disconnect mynet NAME # detach container
docker network rm mynet # remove network
docker network prune # remove unused networks
Always start with a diagnostic before cleaning:
# Check what's using space
docker system df # summary
docker system df -v # detailed breakdown
# Targeted cleanup (safe)
docker container prune # stopped containers
docker image prune # dangling images
docker volume prune # unused volumes
docker network prune # unused networks
# Aggressive cleanup (confirm with user first!)
docker system prune # containers + images + networks
docker system prune -a # also unused images
docker system prune -a --volumes # EVERYTHING — named volumes too
Warning: Never run docker system prune -a --volumes without confirming with the user. This removes named volumes with potentially important data.
| Problem | Cause | Fix |
|---------|-------|-----|
| Container exits immediately | Main process finished or crashed | Check docker logs NAME, try docker run -it --entrypoint /bin/sh IMAGE |
| "port is already allocated" | Another process using that port | docker ps or lsof -i :PORT to find it |
| "no space left on device" | Docker disk full | docker system df then targeted prune |
| Can't connect to container | App binds to 127.0.0.1 inside container | App must bind to 0.0.0.0, check -p mapping |
| Permission denied on volume | UID/GID mismatch host vs container | Use --user $(id -u):$(id -g) or fix permissions |
| Compose services can't reach each other | Wrong network or service name | Services use service name as hostname, check docker compose config |
| Build cache not working | Layer order wrong in Dockerfile | Put rarely-changing layers first (deps before source code) |
| Image too large | No multi-stage build, no .dockerignore | Use multi-stage builds, add .dockerignore |
After any Docker operation, verify the result:
docker ps (check status is "Up")docker logs --tail 20 NAME (no errors)curl -s http://localhost:PORT or docker port NAMEdocker images | grep TAGdocker compose ps (all services "running" or "healthy")docker system df (compare before/after)When reviewing or creating a Dockerfile, suggest these improvements:
node_modules, .git, __pycache__, etc.node:20-alpine not node:latestUSER instruction for securitypython:3.12-slim not python:3.12data-ai
Spaced-repetition flashcard system. Create cards from facts or text, chat with flashcards using free-text answers graded by the agent, generate quizzes from YouTube transcripts, review due cards with adaptive scheduling, and export/import decks as CSV.
development
Canvas LMS integration — fetch enrolled courses and assignments using API token authentication.
development
Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs with Float8, torch.compile, and distributed checkpointing.
devops
Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency. Use for production deployment on NVIDIA GPUs (A100/H100), when you need 10-100x faster inference than PyTorch, or for serving models with quantization (FP8/INT4), in-flight batching, and multi-GPU scaling.