hermes-backup/daily/2026-04-28_203212/skills/devops/vps-service-gap-audit/SKILL.md
Audit what's actually running vs what's defined — catch services built but not deployed.
npx skillsauth add ariffazil/openclaw-workspace vps-service-gap-auditInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When to use: Suspect a service is "defined but not deployed," or need to audit what's actually running vs. what should be running.
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
Compare against the compose file's services: keys. Mark which are missing.
ss -tlnp | grep -E "7071|3001|8084|8085|8080|8081"
If a service is defined in compose but the port is dark, it's not running.
cat /root/compose/Caddyfile
Check each subdomain's reverse_proxy directive. Static file_server vs live reverse_proxy tells you if the route is pointing at a live service or a dead landing.
ls /path/to/service/.env
Missing .env = service can't start even if manually invoked.
| Check | Command | Dead if |
|-------|---------|---------|
| Bridge container | docker ps \| grep af-bridge | Not found |
| Port 7071 listener | ss -tlnp \| grep 7071 | Dark |
| Caddy forge route | grep forge /root/compose/Caddyfile | file_server instead of reverse_proxy |
| .env | ls /root/A-FORGE/.env | Not found |
| In main stack | grep -c A-FORGE /root/compose/docker-compose.yml | Returns 0 |
A service can be "defined in a docker-compose.yml" but absent from the main stack. The A-FORGE/docker-compose.yml defined af-bridge-prod, but it was never added to /root/compose/docker-compose.yml. The source code was there. The Dockerfile was there. The compose definition was there. But it wasn't in the orchestrating file.
Always check: is the service in the orchestrating compose file, not just its own sub-directory compose?
The VPS runs three separate Docker Compose projects that interoperate:
| Project | Config Path | Role |
|---------|-------------|------|
| a-forge | /root/A-FORGE/docker-compose.yml | Metabolic shell — AgentEngine, sense bridge |
| af-forge | /root/arifOS/deployments/af-forge/docker-compose.yml | Constitutional kernel — arifOS F1–F13, VAULT999 |
| compose | /root/compose/docker-compose.yml | Shared infrastructure + domain organs |
All three are intentionally separate — kernel can be rebuilt without touching infra.
When a bare process and a systemd service both try to own the same gateway:
Symptoms:
ps aux shows two PIDs for the same binary (e.g., PID 2010856 bare + PID 2011048 systemd child)"already running under systemd; waiting 5000ms before retrying startup" every 10sDiagnosis:
# Check journal for the retry pattern — THIS IS GROUND TRUTH
journalctl -u openclaw-gateway.service --no-pager -n 20
# Check which PID actually owns the port
ss -ltnp | grep <port>
# Check CPU time — real work vs retry overhead
ps aux | grep openclaw | grep -v grep
# High CPU + high uptime = real worker
# Low CPU + low uptime = retry loop
Fix — Option A (keep bare, disable systemd):
sudo systemctl disable <service>
sudo kill <systemd-child-pid> # systemd won't restart because it's disabled
Fix — Option B (migrate fully to systemd):
sudo kill <bare-pid> # stop bare first
sudo systemctl restart <service>
The hermes-asi-gateway case: It failed because openclaw-gateway bare was already polling Telegram. hermes exited immediately with "Gateway already running" — correct behavior. The failed state just needed systemctl reset-failed hermes-asi-gateway.service to clear journal clutter.
A commit updates pyproject.toml with new dependency versions (e.g., fastmcp==3.2.4), but the container still runs the old version because the image was never rebuilt.
Detecting:
# Repo says
grep fastmcp /root/arifOS/pyproject.toml
# Container has
docker exec <container> pip show fastmcp | grep Version
Always rebuild + push + redeploy after dependency bumps. The image SHA must match the git commit SHA for the deployment to be coherent.
.agent/ (singular) vs .agents/ (plural) — some workflow files reference the wrong directory name. MCP Market listed .agent/workflows/fag.md but the repo uses .agents/ (plural). This catches skill installs.
development
Governed intelligence skill for AAA as the abstraction, attestation, and abduction control plane across arifOS, APEX, A-FORGE, GEOX, WEALTH, WELL, and the ariffazil profile repository. Use when the user asks to explain or design AAA, route agentic work, reduce chaos/entropy in an arifOS federation task, create AREP/task declarations, classify risk, plan multi-repo changes, review governance boundaries, or translate human intent into evidence-backed, authority-safe, recursively agentic workflows. Provides deterministic F1-F13 floor checking, bounded abduction, and FederationReceipt composition.
development
Check every skill’s “use when” and “do not use when” clauses for collisions, missing negatives, and vague verbs like “help,” “assist,” or “improve.” Load when linting, reviewing, or validating trigger boundaries.
development
Bootstrap, design, and package new skills. Load when capturing user intent for a new skill or drafting its initial instruction framework.
content-media
Diagnose which federation services are up, down, or drifting. Produce a prioritized remediation plan.