infrastructure/skills/supervisord/SKILL.md
Supervisord process manager for running multiple services inside containers. Use when working with supervisord, container service management, multi-process containers, event listeners for crash-loop circuit breaking, or service priority ordering.
npx skillsauth add overthinkos/overthink-plugins supervisordInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
| Property | Value |
|----------|-------|
| Dependencies | (none) — system Python comes from the supervisor RPM's own dep |
| Install files | charly.yml, supervisord.header.conf (via build.yml init: section header_file) |
| Init role | Default init system for container images (set in build.yml init: section) |
supervisor (RPM) — process control system. The RPM brings
/usr/bin/python3 as its own dependency; supervisord's shebang is
/usr/bin/python3. No pixi-python / no conda-forge Python env is
needed.supervisor (pac) — same package name on Arch (extra/ repo), same
system-Python dependency.require: pythonThis layer declares no require: python. supervisord's runtime is
pure system-Python (the supervisor RPM brings /usr/bin/python3
as its own dependency), so it needs no python charly-layer → pixi
charly-layer → conda-forge Python env (~500 MB). See CLAUDE.md
"Key Rules" → "Don't declare defensive deps" for the general rule.
Arch note: the pac: [supervisor] section is required so that
Arch-based images with build: [pac] actually install supervisord.
Without it, an Arch image gets a /etc/supervisord.conf with no
supervisord binary, and the quadlet's supervisord -n -c /etc/supervisord.conf
exits 127 (command not found) at container start. Every Arch-rooted
auto-intermediate that composes supervisord needs it.
charly Generates Supervisord ConfigsCandies declare processes via the unified service: schema in charly.yml (see /charly-image:layer "Service Declaration"). Each entry is rendered through supervisord's service_schema.service_template in build.yml, which produces a [program:<name>] INI fragment. charly box generate collects all rendered fragments across the candy chain and writes them into /etc/supervisord.conf inside the image, prefixed by the header from templates/supervisord.header.conf (referenced from build.yml init: section).
# candy/chrome/charly.yml — unified schema
service:
- name: chrome
exec: /home/user/.local/bin/chrome-wrapper --force-renderer-accessibility --no-first-run --start-maximized
restart: always
user: user
env:
# ...
priority: 50 # supervisord-specific; ordering hint
stdout: file:/tmp/supervisor-chrome.log
scope: system
enable: true
The render template maps the abstract spec to supervisord INI:
| service: field | Supervisord output |
|---|---|
| exec | command= |
| env | environment=K="V",K2="V2" |
| restart: always | autorestart=true (helper supervisordRestart) |
| restart: on-failure | autorestart=unexpected |
| restart: no / unset | autorestart=false |
| working_directory | directory= |
| user | user= |
| priority | priority= (supervisord-specific) |
| stop_timeout | stopwaitsecs= |
| stdout: file:/path | stdout_logfile=/path (helper supervisordLog) |
| stdout: journal / unset | stdout_logfile=/dev/stdout |
Supervisord does NOT support use_packaged: — it doesn't consume systemd units. Entries with use_packaged: are skipped with a warning when rendering into a supervisord-init image. Use a custom entry (with explicit exec:) on supervisord-init images, or target a systemd init (bootc / target: local) where use_packaged: is honored.
The rendered INI lands at /etc/supervisord.d/<layer>-<name>.conf and is assembled into /etc/supervisord.conf at container-build time via the init system's fragment pipeline (build.yml init.supervisord.fragment_template + stage_fragment_copy).
At runtime, PID 1 is supervisord. The container's lifecycle is the supervisord process's lifecycle — when supervisord exits, the container exits (and the systemd quadlet's Restart=always rebuilds it).
service: declarationLayers declare services under a single service: key (singular; value is a list of ServiceEntry). Two entry shapes: use_packaged: reuses a distro-shipped systemd unit; custom exec (with exec:, env:, restart:, …) renders via the init-system's service_template:.
Supervisord lifecycle directives on ServiceEntry (all optional — render only when set):
| Field | Purpose |
|---|---|
| kind: eventlistener | Emits [eventlistener:...] instead of [program:...]; use with events:. |
| events: | Supervisord eventlistener trigger list (e.g. PROCESS_STATE_FATAL). |
| auto_start: | Tri-state bool. false for services hand-started by another program. |
| start_retries: | Max restart attempts before FATAL. The selkies [program:chrome] uses 3. |
| start_secs: | Seconds the process must stay up to count as "started." |
| stop_signal: | TERM (default), INT, HUP, … |
| exit_code: | Success codes for restart: no / on-failure. |
| priority: | Startup order; lower = earlier. |
See /charly-selkies:selkies-core for the canonical consumer (the supervised [program:chrome] service: restart: always + start_secs/start_retries).
A candy that needs the SAME service to run under supervisord (container/pod targets) AND systemd (host installs / bootc / VMs) must NOT spin up a <name>-host sibling candy. The supported pattern is mixed entries in one service: list: same name:, two entries — one with use_packaged: <unit>.service (or .socket) for the systemd render, the other with custom exec: for the supervisord render. Init system at deploy time picks the matching form; the other entry is silently skipped. See /charly-image:layer "Service Declaration" → "Mixed entries in one candy" for the schema, CLAUDE.md "Init-system polymorphism via mixed service: entries" for the project-wide rule, and /charly-infrastructure:virtualization for the canonical worked example.
It is tempting to copy-paste-and-rename a candy with a -host suffix when the schema already supports polymorphism via mixed entries. If you find yourself reaching for -host, reach for a second service: entry instead.
Supervisord starts programs in priority order (ascending). Candies set priorities explicitly so that dependency chains come up in the right order:
| Priority | Typical services |
|----------|------------------|
| 1-9 | System-level: dbus, pipewire |
| 10 | Compositors: sway, labwc |
| 15-20 | Desktop extras: waybar, swaync |
| 30-50 | Applications: chrome, openclaw, hermes |
| 100 | Auxiliary: cdp-proxy, chrome-devtools-mcp |
| 200+ | Last-start daemons: event listeners |
Programs with no priority= default to 999 and start last.
The knobs that shape crash-recovery behavior:
autostart=true (default) — Start at container boot. Use autostart=false only for services that must be hand-started by another program. (Services that just need a Wayland compositor up first don't need this — e.g. the selkies [program:chrome] service in selkies-core stays autostart=true and self-synchronizes by polling for the compositor's wayland-0 socket inside chrome-wrapper.)autorestart=true (default) — Restart the program when it exits non-zero. Combined with startretries, this creates a crash-retry loop.startretries=N (default 3) — How many consecutive restarts supervisord will attempt before declaring the program FATAL. Once FATAL, supervisord stops trying.startsecs=N (default 1) — How long the program must stay up before supervisord considers the start "successful." Crashes within startsecs count against startretries; crashes after reset the counter.A program that can't stay up for startsecs seconds, startretries times in a row, is marked FATAL and a PROCESS_STATE_FATAL event is emitted — which is what the event listener pattern below catches.
Supervisord supports a plugin model for reacting to process state changes: event listeners are separate programs that subscribe to events and can take arbitrary action. They run as supervisord programs themselves (using [eventlistener:<name>] instead of [program:<name>]) and communicate with supervisord over stdin/stdout using a simple protocol.
A PROCESS_STATE_FATAL listener is the supervisord pattern for escalating a
program that has exhausted its startretries budget. The classic action is to
terminate supervisord (PID 1) so the systemd quadlet's Restart=always rebuilds
the whole container — the only way to flush cgroup-level orphan memfd shmem (from
e.g. a Chrome crash loop) that a per-program restart can't release:
[eventlistener:crash-escalate]
command=/path/to/crash-escalate-listener
events=PROCESS_STATE_FATAL
autostart=true
autorestart=true
priority=200
user=user
No candy ships such a listener today. The selkies [program:chrome] service
(in selkies-core, restart: always, start_secs: 5/start_retries: 3) relies
on supervisord's ordinary relaunch for ordinary exits; a genuinely wedged crash
loop is cleared by restarting the container (charly update / charly restart), which tears down the cgroup. The chrome candy's
security.memory_max/memory_high/memory_swap_max caps bound the blast radius.
See /charly-selkies:chrome (Chrome supervision) and /charly-image:layer (Security
Declaration → resource caps).
An event listener is a shell script or Python program that:
READY\n to stdout to tell supervisord it's ready for an event.ver:3.0 server:supervisor serial:N pool:name poolserial:M eventname:NAME len:L).L bytes of payload from stdin.RESULT 2\nOK to stdout.Full protocol details: supervisord event listener docs. supervisorctl avail lists both regular programs and event listeners.
# Inside container
supervisorctl status # All programs + states
supervisorctl avail # All defined programs (including stopped)
supervisorctl pid # Supervisord's own PID — best liveness probe
supervisorctl tail -f chrome stderr # Live stderr log for one program
supervisorctl restart chrome # Restart a single program (does not reset startretries counter)
# From host
charly service status <image> # Wraps supervisorctl status via charly shell
charly service start/stop/restart <image> <name> # Per-program control
charly logs <image> # Container-level stdout/stderr (supervisord output)
supervisorctl status returns exit 3 when ANY program is non-RUNNING
(FATAL, STOPPED, or EXITED). Candies with autostart=false programs
(e.g. hermes ships hermes-whatsapp as autostart=false) will always
fail a naive command: supervisorctl status; exit_status: 0 test.
The robust liveness probe is supervisorctl pid, which asks
supervisord for its own PID and exits 0 iff the socket responds:
- id: supervisorctl-responds
scope: deploy
command: supervisorctl pid
exit_status: 0
in_container: true
This is what the current supervisord candy ships in its eval: block.
See /charly-eval:eval Authoring Gotcha #4.
Also note: pgrep is NOT installed by default in minimal images
(needs procps-ng). The process: <name>; running: true test verb
silently fails when pgrep is absent. Prefer service: (which uses
supervisorctl internally) for program-liveness checks. See /charly-eval:eval
Authoring Gotcha #3.
supervisorctl avail | grep -q '^<name>\b' is the idiomatic way to check whether a program is defined (as opposed to whether it's currently running). This is what labwc's autostart uses to decide whether to hand off Chrome to supervisord or fall back to a direct launch — see /charly-selkies:labwc (autostart Chrome-duplication race) for the canonical pattern.
templates/supervisord.header.conf (at the repo root) defines global [supervisord], [unix_http_server], [supervisorctl], and [rpcinterface:supervisor] sections. It's referenced from build.yml init: section as the init system's header_file: and prepended to every generated supervisord.conf. Common fields:
[unix_http_server] file=/tmp/supervisor.sock — the socket supervisorctl uses[supervisord] nodaemon=true logfile=/tmp/supervisord.log pidfile=/tmp/supervisord.pid[supervisord] user=user — supervisord runs as the container user, not rootRPM: supervisor (Fedora) · PAC: supervisor (Arch community) · DEB: supervisor (Debian/Ubuntu — package is named supervisor, not supervisord). Full parity; the process manager itself plus the header template is identical across distros. The [supervisord] user=user line in templates/supervisord.header.conf is a fallback only — when user_policy: adopt fires on an ubuntu base image, supervisord inherits the adopted ubuntu identity at runtime via the USER directive, so user=user in the header is harmless (supervisord's user= field is advisory — uid passthrough from the Dockerfile USER directive takes precedence).
# charly.yml
my-image:
candy:
- supervisord
- my-service # layers with service: entries need supervisord
Adding a service: block to a candy automatically pulls in supervisord via build.yml init: section's depends_candy. You rarely add supervisord to a box's candy: list manually.
Transitive dependency for all boxes with managed services, including:
openclaw, jupyter, jupyter-ml, jupyter-ml-notebook, ollama, comfyui, immich, immich-ml, selkies-desktop, selkies-labwc-nvidia, hermes, openwebui, filebrowser.
On non-bootc images, supervisord is container PID 1 (ENTRYPOINT=supervisord emitted by the init system definition in build.yml). On bootc images, systemd is PID 1, which means supervisord needs a systemd unit wrapper — otherwise the whole desktop tier never starts. The wrapper is a systemd user unit at /etc/systemd/user/supervisord.service, systemctl --global enabled so tty1 autologin brings it up (a linger sentinel file keeps it alive across logouts).
Both involve opening /dev/stdout or /dev/fd/1, which resolve to the journal pipe under a systemd user service — and open() on a pipe returns ENXIO.
templates/supervisord.header.conf) was changed from logfile=/dev/stdout to logfile=/tmp/supervisord.log. /dev/stdout works when supervisord is PID 1 in a container (fd 1 is real stdio), but fails with OSError: [Errno 6] No such device or address: '/dev/stdout' under a systemd user service where fd 1 is a pipe. Writing to a regular file works everywhere.stdout_logfile=/dev/fd/1. Every candy's service: fragment redirects program stdout to /dev/fd/1 so container logs (charly logs <image>) show per-program output. Under a systemd user service this fails with unknown error making dispatchers for <name>: ENXIO for every program. The fix lives in the systemd user unit itself — set StandardOutput=file:/tmp/supervisord-stdout.log so supervisord's fd 1 backs a real file, not a pipe. Existing per-program /dev/fd/1 lines then resolve correctly.Container-mode logs are unaffected — supervisord is still PID 1 there.
/charly-languages:python -- Optional pixi-python env (NOT a dep of this candy; supervisord uses system python3 from RPM)/charly-selkies:selkies-core -- owns the supervised [program:chrome] service (restart: always, start_secs/start_retries) for both selkies flavors/charly-selkies:chrome -- the chrome candy's cgroup resource caps (bound a Chrome crash loop's blast radius)/charly-infrastructure:traefik -- Reverse proxy (depends on supervisord)/charly-infrastructure:dbus-layer -- D-Bus session bus (depends on supervisord)/charly-ollama:ollama, /charly-openclaw:openclaw, /charly-infrastructure:postgresql, /charly-infrastructure:redis, /charly-selkies:sway -- All ship service: blocks/charly-core:service — Start/stop/restart/status for individual services inside the container/charly-core:logs — Container-level log access (shows supervisord-aggregated stdout/stderr)/charly-core:charly-status — Container status including service probe results/charly-build:generate — Containerfile generation: where service: blocks get written to /etc/supervisord.conf/charly-image:layer — service: field authoring + security: cgroup resource capsUse when the user asks about:
PROCESS_STATE_FATAL crash-escalation pattern)supervisorctl commands (status, avail, start, tail, restart)charly service commands (start/stop/restart/status)supervisor RPM package or /tmp/supervisor.socktools
OpenCharly CLI (charly) binary installed into container/VM images for in-container use. Use when working with charly binary deployment inside containers, native D-Bus support, or the full charly toolchain (charly binary + virtualization + gocryptfs + socat).
development
Operator CachyOS workstation profile — a kind:local template + target:local deploy that installs the full dev stack (30 candies) onto a CachyOS host via ShellExecutor. Lives in the overthinkos/cachyos submodule. MUST be invoked before editing or applying the charly-cachyos workstation profile.
tools
Fedora box with the full charly toolchain using shared candies. Rootless-first — runs as uid=1000 with passwordless sudo (no root, no cap_add: ALL). Same candy list as charly-arch. Includes NVIDIA GPU runtime. MUST be invoked before building, deploying, configuring, or troubleshooting the charly-fedora box.
tools
Arch Linux box with the full charly toolchain. Rootless-first — runs as uid=1000 with passwordless sudo (no root, no cap_add: ALL). Composes /charly-coder:charly-mcp so the box is reachable as an MCP gateway on port 18765. NVIDIA GPU runtime composed in. MUST be invoked before building, deploying, configuring, or troubleshooting the charly-arch box.