versa/skills/airflow-layer/SKILL.md
Apache Airflow 3.x with LocalExecutor + SQLite (single-node, dev-friendly), 4 supervisord services (init, scheduler, dag-processor, webserver). Layer is service-only — its Python deps live in /charly-versa:versa-layer's pixi env. No MCP wrapper (no upstream v2 release exists; consumers drive Airflow via direct REST /api/v2 calls). Use when working with the airflow layer, Airflow 3.x compatibility findings, the SimpleAuthManager auth-fix pattern, the dag-processor split-from-scheduler architecture change, or the JWT-issuance + REST API trigger flow used by self-authoring notebooks.
npx skillsauth add overthinkos/overthink-plugins airflow-layerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
LocalExecutor + SQLite — zero external services. Suitable for
single-node dev / R10 verification. For multi-node production swap
in CeleryExecutor + Postgres + Valkey (those layers stayed available
under candy/postgresql/ + candy/valkey/).
This layer is service-only: it ships no pixi.toml. Its Python
deps (apache-airflow) live in /charly-versa:versa-layer's pixi env,
so airflow runs alongside marimo in the same pod with one combined
Python environment.
| Property | Value |
|----------|-------|
| Dependencies | marimo (for the pixi env), supervisord |
| Ports | 8080 (api-server, host-mapped to 28080) |
| Services | airflow-init (one-shot) + airflow-scheduler + airflow-dag-processor + airflow-webserver |
| Volumes | airflow-data at ~/airflow |
| Secrets | airflow-fernet-key, airflow-webserver-secret, airflow-admin-password |
| MCP provides | none (no Airflow-3 / /api/v2 release of the upstream mcp-server-apache-airflow package exists; consumers drive the REST API directly via JWT + /api/v2/) |
| DAG folder | /workspace/dags (AIRFLOW__CORE__DAGS_FOLDER) |
The layer was hardened across multiple RCA cycles to handle Airflow 3.x's architectural changes. Each finding is encoded in either the init script, env vars, or supervisord service list:
Airflow 3.x removed airflow users create. The new SimpleAuthManager
generates a random admin password on first boot and writes it to
${AIRFLOW_HOME}/simple_auth_manager_passwords.json.generated —
that random value bears no relation to the AIRFLOW_ADMIN_PASSWORD
secret we inject, so JWT auth via /auth/token rejects every request
with 401.
Fix in airflow-init.sh: overwrite the passwords file from
${AIRFLOW_ADMIN_PASSWORD} on every boot. Idempotent; tolerant of
secret rotation.
mkdir -p "${HOME}/airflow"
printf '{"admin": "%s"}\n' "${AIRFLOW_ADMIN_PASSWORD}" \
> "${HOME}/airflow/simple_auth_manager_passwords.json.generated"
chmod 0600 "${HOME}/airflow/simple_auth_manager_passwords.json.generated"
The Airflow-2 key path [scheduler] dag_dir_list_interval is
deprecated in 3.x; the canonical key is now
[dag_processor] refresh_interval. Layer env block sets:
env:
AIRFLOW__DAG_PROCESSOR__REFRESH_INTERVAL: "10"
Default is 300 seconds — far too slow for the runtime-DAG-write
pattern (notebook drops a .py and triggers it within seconds). 10s
makes the scan effectively interactive at negligible CPU cost on a
single-user dev pod.
Airflow 3.x splits the DAG processor out of the scheduler — without a
separate airflow dag-processor daemon, the dags folder is never
scanned (notebooks waiting on registration deadlock forever).
Fix: new airflow-dag-processor supervisord service alongside
airflow-scheduler. Without this service, dag_processor.refresh_interval
has no effect because nothing honors it.
Airflow 2.x auto-generated logical_date if omitted; 3.x rejects with
422. Notebook trigger code must include:
from datetime import datetime, timezone
requests.post(f"{api}/api/v2/dags/{dag_id}/dagRuns",
headers=auth,
json={"conf": {}, "logical_date": datetime.now(timezone.utc).isoformat()},
timeout=10)
Airflow 3.x dropped airflow webserver in favor of airflow api-server
(FastAPI-based). The airflow-webserver.sh wrapper was updated:
exec "${HOME}/.pixi/envs/default/bin/airflow" api-server
The api-server uses JWT issued by SimpleAuthManager. Clients fetch
a token via POST /auth/token (admin/password JSON body), then pass
Authorization: Bearer <token> on subsequent calls. Basic auth is
rejected on /api/v2/ endpoints by default.
airflow-init is a one-shot supervisord program (restart: "no",
exit_codes: "0") that runs airflow db migrate + writes the
passwords file. It exits cleanly on success; the long-running daemons
start at higher priority numbers (30+) once init has finished.
| Name | Priority | Restart | Purpose |
|---|---:|---|---|
| airflow-init | 25 | no | airflow db migrate + write SimpleAuthManager passwords file |
| airflow-scheduler | 30 | always | Schedule DAG runs from the metadata DB |
| airflow-dag-processor | 31 | always | Scan dags folder + serialise DAGs into the DB |
| airflow-webserver | 31 | always | airflow api-server (FastAPI on :8080) |
Build-scope (4):
airflow-init-script /usr/local/bin/airflow-init.sh exists + executableairflow-webserver-script dittoairflow-scheduler-script dittoairflow-dag-processor-script ditto (added with #3 above)Deploy-scope (4):
airflow-init-db-created — ~/airflow/airflow.db exists (SQLite metadata)airflow-scheduler-running, airflow-dag-processor-running, airflow-webserver-running — supervisord program statesairflow-port-reachable — TCP 8080 reachableairflow-http-version — GET /api/v2/version returns 200airflow-jwt-issuance — POST /auth/token with admin creds returns body containing access_token (THE notebook auth path; locks in finding #1)The marimo notebook in /charly-versa:notebook-osm writes its own DAG
file into /workspace/dags/ and triggers it via the REST API:
/auth/token (admin / $AIRFLOW_ADMIN_PASSWORD) → JWTGET /api/v2/dags/<id>
(typically <30 s with REFRESH_INTERVAL=10)/api/v2/dags/<id> with {"is_paused": false}/api/v2/dags/<id>/dagRuns with logical_date/api/v2/dags/<id>/dagRuns/<run_id> until state in ("success", "failed")The notebook uses this pattern to fire BOTH notebook_osm_pipeline
and notebook_gtfs_pipeline in parallel.
/charly-versa:versa-layer — pixi env that owns the airflow Python deps/charly-versa:notebook-osm — canonical user of the REST trigger pattern/charly-versa:versa — the image composing this layer/charly-infrastructure:supervisord — service mgmt/charly-build:secrets — the 3 airflow secrets injected via this layertools
OpenCharly CLI (charly) binary installed into container/VM images for in-container use. Use when working with charly binary deployment inside containers, native D-Bus support, or the full charly toolchain (charly binary + virtualization + gocryptfs + socat).
development
Operator CachyOS workstation profile — a kind:local template + target:local deploy that installs the full dev stack (30 candies) onto a CachyOS host via ShellExecutor. Lives in the overthinkos/cachyos submodule. MUST be invoked before editing or applying the charly-cachyos workstation profile.
tools
Fedora box with the full charly toolchain using shared candies. Rootless-first — runs as uid=1000 with passwordless sudo (no root, no cap_add: ALL). Same candy list as charly-arch. Includes NVIDIA GPU runtime. MUST be invoked before building, deploying, configuring, or troubleshooting the charly-fedora box.
tools
Arch Linux box with the full charly toolchain. Rootless-first — runs as uid=1000 with passwordless sudo (no root, no cap_add: ALL). Composes /charly-coder:charly-mcp so the box is reachable as an MCP gateway on port 18765. NVIDIA GPU runtime composed in. MUST be invoked before building, deploying, configuring, or troubleshooting the charly-arch box.