ov-foundation/skills/rocm/SKILL.md
AMD ROCm runtime, OpenCL, and GPU compute support via system packages. Use when working with AMD GPU computing, ROCm, HIP, OpenCL, or AMD GPU passthrough in containers.
npx skillsauth add overthinkos/overthink-plugins rocmInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
| Property | Value |
|----------|-------|
| Install files | layer.yml |
| Variable | Value |
|----------|-------|
| ROCM_PATH | /usr |
| Variable | Source | Example |
|----------|--------|---------|
| HSA_OVERRIDE_GFX_VERSION | KFD topology sysfs | 10.3.0 (RDNA2), 11.0.0 (RDNA3) |
| DRINODE | DRM render node enumeration | /dev/dri/renderD128 (typical), /dev/dri/renderD129 (multi-GPU) |
Both variables are not baked into the layer — they are auto-detected from host state at runtime and injected as container environment variables via appendAutoDetectedEnv() in ov/devices.go. The same function is called by ov config, ov start, and ov shell, so interactive shells and deployed services see the identical env set.
HSA_OVERRIDE_GFX_VERSION is read from /sys/class/kfd/kfd/topology/nodes/*/properties (gfx_target_version field).DRINODE is selected by walking /dev/dri/renderD* and picking the node that matches the AMD PCI device exposed to the container.Override either with -e HSA_OVERRIDE_GFX_VERSION=X.Y.Z or -e DRINODE=/dev/dri/renderD129. See /ov-core:doctor (Hardware Detection) for how the probe runs on the host side, and /ov-foundation:nvidia (DRINODE Auto-Injection) for the NVIDIA counterpart using the same mechanism.
security:
group_add:
- keep-groups
Uses keep-groups to preserve host supplementary groups (video, render) inside the container. This is the standard approach across all layers -- Podman's keep-groups is mutually exclusive with explicit group names.
RPM (Fedora system repos): rocm-hip-runtime, rocm-opencl, rocm-clinfo, rocm-smi
AMD GPU support requires:
/dev/kfd device (auto-detected by ov)/dev/dri/renderD* render nodes (auto-detected)video and render groups (ov udev status to check)amdgpu kernel driver loadedRun ov doctor to verify detection. Run ov udev install to set up device permissions.
# image.yml -- standalone AMD GPU image
my-amd-app:
base: fedora
layers:
- rocm
- my-app
# Check AMD GPU detected on host
ov doctor | grep "AMD GPU"
# Verify inside container
ov shell my-amd-app -c "clinfo --list"
ov shell my-amd-app -c "rocm-smi"
ov shell my-amd-app -c "echo \$HSA_OVERRIDE_GFX_VERSION"
/ov-foundation:nvidia -- NVIDIA GPU counterpart (runtime libs + CDI), shares appendAutoDetectedEnv() DRINODE injection/ov-foundation:cuda -- NVIDIA CUDA toolkit (stacked on nvidia)/ov-foundation:python-ml -- ML Python environment (currently depends on cuda; ROCm equivalent is a future direction)/ov-core:doctor -- Host AMD GPU detection (/dev/kfd, render nodes, driver status)/ov-core:shell -- Interactive shells receive the same auto-detected HSA_OVERRIDE_GFX_VERSION + DRINODE envs/ov-advanced:udev -- Device permission management for /dev/kfd and /dev/dri/renderD*/ov-core:config -- Runtime GPU env injection at deployment time (same auto-detect path)/ov-core:start -- Runtime GPU env injection at service start timeNot directly used in any current image definition. Available as a standalone layer for AMD GPU support. The NVIDIA base image (/ov-foundation:nvidia) is the currently-shipped GPU image; an AMD counterpart can be composed by substituting this layer.
Use when the user asks about:
/dev/kfd device accessHSA_OVERRIDE_GFX_VERSION configuration/ov-build:layer — layer authoring reference (layer.yml schema, task verbs, service declarations)/ov-build:eval — declarative testing (eval: block, ov eval image, ov eval live)development
Claude Code multi-agent support in Overthink — sub-agents, dynamic workflows, and agent teams, and how each drives the existing `ov eval` disposable beds to test and verify. MUST be invoked before authoring or invoking an ov sub-agent / dynamic workflow / agent team, wiring agent-lifecycle hooks, or asking "which primitive should drive the R10 beds?".
tools
Mounts a virtiofs share tagged `workspace` at /workspace inside a VM guest via a systemd .mount unit. Use when a kind:vm entity shares a host directory into the guest and you need it auto-mounted (and re-mounted at every boot).
development
MUST be invoked before any work involving: the `kind: android` schema kind, a `target: android` deploy, the `apk:` layer package format (installing Android apps declaratively), AndroidDeployTarget, an in-pod emulator OR a remote/physical adb-endpoint device, or nested `pod → android` deployment. The first-class Android device + app surface that sits above `ov eval adb`/`appium`.
tools
Use when committing, branching, pushing, merging, tagging, creating PRs, or approving/merging PRs with gh — the feat/-branch, R10-gated, never-force-push landing workflow across the main repo + the plugins submodule + image/<distro> submodules. Covers sync-to-upstream, branch/worktree pruning, the fork+PR path for contributors without write access, and cross-repo @github landing order.