ov-foundation/skills/nvidia-layer/SKILL.md
NVIDIA GPU runtime support: driver libs, nvidia-container-toolkit (CDI), and VA-API. Fedora (negativo17) and Arch Linux (pac). Base layer for all GPU-accelerated images. Use when working with NVIDIA GPU support, CDI device injection, or the nvidia layer.
npx skillsauth add overthinkos/overthink-plugins nvidiaInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
NVIDIA runtime layer providing nvidia-container-toolkit for CDI device injection and VA-API hardware video acceleration. Driver userspace libraries (libcuda, libnvidia-ml, etc.) are NOT bundled — CDI provides host-matching driver libs at runtime, preventing version mismatches between container and host kernel module. Supports both Fedora and Arch Linux.
| Property | Value |
|----------|-------|
| Install files | layer.yml, tasks: |
| Depends | none |
RPM (from negativo17 fedora-multimedia repo):
nvidia-container-toolkit — nvidia-ctk CLI for CDI spec generationlibva-nvidia-driver — VA-API hardware video accelerationPAC (Arch Linux):
nvidia-utils — NVIDIA GL/Vulkan userspace, nvidia-sminvidia-container-toolkit — nvidia-ctk CLI for CDI spec generation| Variable | Value |
|----------|-------|
| LD_LIBRARY_PATH | /usr/lib64 (ensures CDI-injected host driver libs are found by the dynamic linker) |
The nvidia-container-toolkit provides nvidia-ctk which generates CDI (Container Device Interface) specs. ov calls EnsureCDI() before launching containers with GPU — if CDI specs don't exist at /etc/cdi/nvidia.yaml, it runs nvidia-ctk cdi generate to create them. This enables GPU access in nested containers where host CDI specs are not inherited.
Arch's nvidia-container-toolkit ships a pacman post-install hook that
invokes nvidia-ctk cdi generate. On a host with no NVIDIA driver
loaded (e.g., an AMD-only build host), the hook fails NVML init:
ERROR: failed to generate CDI spec: failed to initialize NVML: Driver Not Loaded
error: command failed to execute correctly
This is benign for the build — pacman still exits 0 (hooks don't
affect the parent transaction's status), the layer finishes installing,
and the resulting image works at runtime on a GPU-bearing host (where
the CDI spec is generated via EnsureCDI() at container-launch time,
not build time). You can ignore the error message. RPM installs don't
trigger the hook, so Fedora-based images don't see this noise.
If you ever build inside CI where even-benign hook errors matter,
either build arch-nvidia images on a GPU-bearing runner, or patch the
layer to carry a build-time NVIDIA_VISIBLE_DEVICES=void env var so
nvidia-ctk skips CDI gen.
NVIDIA VAAPI acceleration requires the container to know which DRM render node to bind the EGL context against. On multi-GPU hosts there may be /dev/dri/renderD128, /dev/dri/renderD129, … and the correct one depends on which physical card backs the NVIDIA driver.
ov does not bake a hardcoded DRINODE=/dev/dri/renderD128 into this layer. Instead, it auto-detects the correct render node at container-launch time and injects it as an environment variable. The detection + injection is consolidated in a single function, appendAutoDetectedEnv() in ov/devices.go, which is called by ov config, ov start, and ov shell — so the three commands always produce the same env set.
Selkies is the primary consumer: pixelflux's Wayland compositor uses DRINODE to open the render node and set up the VAAPI H.264 encoder. Without the injection, selkies would fall back to software encode (libx264) and lose ~40% of its streaming bandwidth budget.
The injection replaces 10 previously-scattered GPU device injection blocks across the ov source tree — see commit 8f6f322 for the consolidation history. If you see DRINODE referenced in layer scripts, you can assume it was auto-detected and injected by ov, not set by the user.
See /ov-core:doctor (Hardware Detection) for the detection probe and /ov-foundation:rocm for the AMD-side counterpart using the same mechanism.
Images that declare base: nvidia (e.g., /ov-selkies:selkies-desktop-nvidia, /ov-selkies:selkies-desktop-ov) still run cleanly on hosts with a different GPU vendor — the NVIDIA runtime libraries ride along as benign passengers. ov config auto-detects whatever the host actually exposes (e.g., /dev/dri/renderD128 + /dev/kfd for an AMD RDNA3), injects those device nodes + DRINODE, and Mesa handles rendering. Confirmed 2026-04-19: selkies-desktop-ov (base: nvidia) on an AMD gfx 11.0.0 host — 15/15 supervisord programs RUNNING, selkies streaming over Mesa, no CUDA calls attempted. The CUDA toolkit in the image simply goes unused.
Creates Vulkan ICD compatibility symlinks for nvidia-ctk CDI device injection.
/ov-foundation:nvidia — NVIDIA GPU base image (nvidia + cuda layers)/ov-coder:arch-ov — Arch Linux ov toolchain (shared layers + nvidia)/ov-foundation:fedora-ov — Fedora ov toolchain (shared layers + nvidia)/ov-foundation:cuda — CUDA development toolkit (depends on nvidia)/ov-foundation:rocm — AMD GPU counterpart (ROCm runtime + OpenCL), uses the same appendAutoDetectedEnv() DRINODE injection/ov-selkies:selkies — Primary consumer of the DRINODE env for VAAPI H.264 encode/ov-foundation:python-ml, /ov-jupyter:llama-cpp, /ov-jupyter:jupyter-ml — CUDA ML stacks that depend on this layer/ov-core:doctor — Host NVIDIA detection (GPU probe, CDI spec status, driver version)/ov-core:shell — DRINODE auto-injection applies to interactive shells too/ov-advanced:udev — Device permission management for /dev/dri/* and /dev/nvidia*/ov-core:config — Runtime GPU device injection at deployment time (same appendAutoDetectedEnv() path)/ov-core:start — Runtime GPU device injection at service start time/ov-build:layer — layer authoring reference (layer.yml schema, task verbs, service declarations)/ov-build:eval — declarative testing (eval: block, ov eval image, ov eval live)development
Claude Code multi-agent support in Overthink — sub-agents, dynamic workflows, and agent teams, and how each drives the existing `ov eval` disposable beds to test and verify. MUST be invoked before authoring or invoking an ov sub-agent / dynamic workflow / agent team, wiring agent-lifecycle hooks, or asking "which primitive should drive the R10 beds?".
tools
Mounts a virtiofs share tagged `workspace` at /workspace inside a VM guest via a systemd .mount unit. Use when a kind:vm entity shares a host directory into the guest and you need it auto-mounted (and re-mounted at every boot).
development
MUST be invoked before any work involving: the `kind: android` schema kind, a `target: android` deploy, the `apk:` layer package format (installing Android apps declaratively), AndroidDeployTarget, an in-pod emulator OR a remote/physical adb-endpoint device, or nested `pod → android` deployment. The first-class Android device + app surface that sits above `ov eval adb`/`appium`.
tools
Use when committing, branching, pushing, merging, tagging, creating PRs, or approving/merging PRs with gh — the feat/-branch, R10-gated, never-force-push landing workflow across the main repo + the plugins submodule + image/<distro> submodules. Covers sync-to-upstream, branch/worktree pruning, the fork+PR path for contributors without write access, and cross-repo @github landing order.