Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

aldengolab/democratic-csi-nvmeof-attach-failure

Name: democratic-csi-nvmeof-attach-failure
Author: aldengolab

skills/democratic-csi-nvmeof-attach-failure/SKILL.md

npx skillsauth add aldengolab/lorist democratic-csi-nvmeof-attach-failure

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

democratic-csi NVMe-oF Volume Attach Failure

Problem

Pods that require NVMe-oF PVCs are stuck in Init:0/1. PVCs provision successfully (TrueNAS creates volumes), but attachment at the node fails. The democratic-csi node plugin logs "unable to attach any nvme devices" on every NodeStageVolume call.

Context / Trigger Conditions

Kubernetes events: MountVolume.MountDevice failed ... unable to attach any nvme devices
democratic-csi node plugin logs: handler error - method: NodeStageVolume error: {"message":"unable to attach any nvme devices"}
PVCs show Bound (provisioning OK) but pods never leave Init state
democratic-csi controller logs show successful CreateVolume responses from TrueNAS

Solution

Work through four failure modes in order:

1. Confirm provisioning vs. attachment split

Check controller logs to confirm volumes were created on TrueNAS:

kubectl logs -n democratic-csi <controller-pod> -c csi-driver --tail=50 | grep -i "CreateVolume\|error"

If CreateVolume succeeded but NodeStageVolume fails, the problem is network/transport, not credentials or config.

2. Check kernel modules

kubectl exec -n democratic-csi <node-pod> -c csi-driver -- cat /proc/modules | grep nvme

Required modules: nvme_tcp, nvme_fabrics, nvme_core. If missing, the host kernel lacks NVMe-oF support.

3. Test nvme binary hostname resolution

kubectl exec -n democratic-csi <node-pod> -c csi-driver -- \
  nvme discover -t tcp -a <shareHost-value> -s <sharePort>

If output contains "No support for hostname IP address resolution; recompile with libnss support":

The nvme binary in the container is statically compiled and can't resolve hostnames via NSS
Workaround: resolve the hostname from within the cluster and use the IP in shareHost
Resolve hostname: kubectl run -it --rm dns-test --image=busybox --restart=Never -- nslookup <hostname>

4. Test transport connectivity with IP

kubectl exec -n democratic-csi <node-pod> -c csi-driver -- \
  nvme discover -t tcp -a <resolved-IP> -s <sharePort>

If output is "Connection refused" or "failed to get transport address":

The NVMe-oF TCP port (default: 4420) is not reachable from the cluster
This is a network/firewall/interface binding issue on the storage server

Confirm with a busybox TCP probe:

kubectl run -it --rm tcp-test --image=busybox --restart=Never -- \
  sh -c "echo connected > /dev/tcp/<IP>/<port> && echo OPEN || echo REFUSED"

5. Fix: NVMe-oF port not reachable

Common cause: the storage server (e.g., TrueNAS) binds the NVMe-oF target to a LAN interface but shareHost in the driver config points to a different IP (e.g., Tailscale overlay IP). The storage controller API (HTTP) works on the overlay but NVMe-oF TCP doesn't.

Fix options:

Change shareHost in the driver config to the LAN/storage-network IP
Or configure the storage server to also bind NVMe-oF on the overlay interface

If the config is managed via ESO (ExternalSecrets), update the backing secret in the secret store, then force a sync:

kubectl annotate externalsecret <name> -n <ns> force-sync=$(date +%s) --overwrite

After updating, restart the democratic-csi node plugin pods to pick up the new config.

6. Check PV volumeAttributes for stale transport address

If steps 1–5 pass (kernel modules present, transport reachable at the correct IP, driver config looks right) but NodeStageVolume still fails, check whether the PV itself stores the wrong transport IP:

kubectl get pv <pv-name> -o jsonpath='{.spec.csi.volumeAttributes.transports}'

Diagnostic signal: node plugin logs will show connecting to transport: tcp://<wrong-IP>:4420 where <wrong-IP> differs from the transports in the current driver config. This happens because volumeAttributes are written at provision time from the storage backend API response and take precedence over the current driver config for existing volumes.

Fix: spec.persistentvolumesource is immutable — it cannot be patched in place. See kubernetes-csi-pv-spec-update for the replace --force + finalizer removal procedure.

Root fix: Correct the NVMe-oF port binding on TrueNAS so future volumes are provisioned with the correct storage-network IP. Existing PVs must still be replaced manually.

Verification

# Confirm NodeStageVolume no longer errors
kubectl logs -n democratic-csi <node-pod> -c csi-driver --tail=20 | grep -i "error\|NodeStage"

# Check pod status
kubectl get pods -n <app-namespace>

Pods should transition from Init:0/1 to Running.

Notes

PVC provisioning (CreateVolume via HTTP API) and volume attachment (NodeStageVolume via NVMe-oF TCP) use completely different network paths — one can work while the other fails.
The nvme binary's hostname resolution failure ("recompile with libnss support") is a separate error from transport connectivity failure ("Connection refused"). Always test with IP after seeing the libnss error.
hostPID: true on the node DaemonSet is required for nsenter-based nvme operations if you go that route.
The driver config secret key must be named driver-config-file.yaml (democratic-csi chart requirement).

References

democratic-csi GitHub — configuration reference for nvmeof driver

aldengolab/democratic-csi-nvmeof-attach-failure

skills/democratic-csi-nvmeof-attach-failure/SKILL.md

Debug democratic-csi NVMe-oF volume attachment failures. Use when: (1) Pods are stuck in Init:0/1 and events show "MountVolume.MountDevice failed: unable to attach any nvme devices", (2) democratic-csi NodeStageVolume returns "unable to attach any nvme devices", (3) PVC provisioning succeeds but pods never start because volumes can't be mounted, (4) Node plugin logs show "connecting to transport: tcp://<IP>:4420" where the IP differs from the driver config. Covers four distinct root causes: nvme binary hostname resolution failure (libnss), NVMe-oF TCP port not reachable (wrong interface or service not running), kernel module availability, and stale transport IP stored in PV volumeAttributes at provision time.

tools

Updated Apr 2, 2026

$ install --global

skillsauth

npx skillsauth add aldengolab/lorist democratic-csi-nvmeof-attach-failure

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 2, 2026, 5:07 PM62.7s1 file scanned

SKILL.md

name:: democratic-csi-nvmeof-attach-failure
description:: |
(1) Pods are stuck in Init:: 0/1 and events show "MountVolume.MountDevice failed: unable to attach any nvme devices",
(4) Node plugin logs show "connecting to transport:: tcp://<IP>:4420" where the IP differs from the driver config.
Covers four distinct root causes:: nvme binary hostname resolution failure (libnss),

democratic-csi NVMe-oF Volume Attach Failure

Problem

Context / Trigger Conditions

Kubernetes events: MountVolume.MountDevice failed ... unable to attach any nvme devices
democratic-csi node plugin logs: handler error - method: NodeStageVolume error: {"message":"unable to attach any nvme devices"}
PVCs show Bound (provisioning OK) but pods never leave Init state
democratic-csi controller logs show successful CreateVolume responses from TrueNAS

Solution

Work through four failure modes in order:

1. Confirm provisioning vs. attachment split

Check controller logs to confirm volumes were created on TrueNAS:

kubectl logs -n democratic-csi <controller-pod> -c csi-driver --tail=50 | grep -i "CreateVolume\|error"

If CreateVolume succeeded but NodeStageVolume fails, the problem is network/transport, not credentials or config.

2. Check kernel modules

kubectl exec -n democratic-csi <node-pod> -c csi-driver -- cat /proc/modules | grep nvme

Required modules: nvme_tcp, nvme_fabrics, nvme_core. If missing, the host kernel lacks NVMe-oF support.

3. Test nvme binary hostname resolution

kubectl exec -n democratic-csi <node-pod> -c csi-driver -- \
  nvme discover -t tcp -a <shareHost-value> -s <sharePort>

If output contains "No support for hostname IP address resolution; recompile with libnss support":

The nvme binary in the container is statically compiled and can't resolve hostnames via NSS
Workaround: resolve the hostname from within the cluster and use the IP in shareHost
Resolve hostname: kubectl run -it --rm dns-test --image=busybox --restart=Never -- nslookup <hostname>

4. Test transport connectivity with IP

kubectl exec -n democratic-csi <node-pod> -c csi-driver -- \
  nvme discover -t tcp -a <resolved-IP> -s <sharePort>

If output is "Connection refused" or "failed to get transport address":

The NVMe-oF TCP port (default: 4420) is not reachable from the cluster
This is a network/firewall/interface binding issue on the storage server

Confirm with a busybox TCP probe:

kubectl run -it --rm tcp-test --image=busybox --restart=Never -- \
  sh -c "echo connected > /dev/tcp/<IP>/<port> && echo OPEN || echo REFUSED"

5. Fix: NVMe-oF port not reachable

Fix options:

Change shareHost in the driver config to the LAN/storage-network IP
Or configure the storage server to also bind NVMe-oF on the overlay interface

If the config is managed via ESO (ExternalSecrets), update the backing secret in the secret store, then force a sync:

kubectl annotate externalsecret <name> -n <ns> force-sync=$(date +%s) --overwrite

After updating, restart the democratic-csi node plugin pods to pick up the new config.

6. Check PV volumeAttributes for stale transport address

kubectl get pv <pv-name> -o jsonpath='{.spec.csi.volumeAttributes.transports}'

Fix: spec.persistentvolumesource is immutable — it cannot be patched in place. See kubernetes-csi-pv-spec-update for the replace --force + finalizer removal procedure.

Root fix: Correct the NVMe-oF port binding on TrueNAS so future volumes are provisioned with the correct storage-network IP. Existing PVs must still be replaced manually.

Verification

# Confirm NodeStageVolume no longer errors
kubectl logs -n democratic-csi <node-pod> -c csi-driver --tail=20 | grep -i "error\|NodeStage"

# Check pod status
kubectl get pods -n <app-namespace>

Pods should transition from Init:0/1 to Running.

Notes

PVC provisioning (CreateVolume via HTTP API) and volume attachment (NodeStageVolume via NVMe-oF TCP) use completely different network paths — one can work while the other fails.
The nvme binary's hostname resolution failure ("recompile with libnss support") is a separate error from transport connectivity failure ("Connection refused"). Always test with IP after seeing the libnss error.
hostPID: true on the node DaemonSet is required for nsenter-based nvme operations if you go that route.
The driver config secret key must be named driver-config-file.yaml (democratic-csi chart requirement).

References

democratic-csi GitHub — configuration reference for nvmeof driver

Related Skills

aldengolab/ubuntu-secureboot-pxe-netboot

development

VerifiedTrustedCommunity

Build a UEFI Secure Boot PXE netboot server for Ubuntu autoinstall. Use when: designing or implementing network boot infrastructure for automated Ubuntu provisioning with Secure Boot enabled. Covers the complete chain: signed shim+GRUB selection, TFTP layout, kernel parameters, autoinstall config requirements, and post-install bootstrapping scripts. Also applicable when debugging an existing PXE setup that uses the wrong GRUB binary or config paths.

SKILL.mdUpdated Apr 10, 2026

aldengolab/ubuntu-secureboot-pxe-netboot

aldengolab/pxe-grub-persistent-server-pattern

development

VerifiedTrustedCommunity

Design pattern for running a persistent PXE/TFTP server that safely coexists with already-installed nodes. Use when: building PXE infrastructure that should stay always-on, designing automated bare-metal provisioning in GitOps/Kubernetes environments, or any PXE setup where UEFI boot order has network boot first. Eliminates boot loops without requiring UEFI firmware changes.

SKILL.mdUpdated Apr 10, 2026

aldengolab/pxe-grub-persistent-server-pattern

aldengolab/orwell-clear-writing

development

VerifiedTrustedCommunity

This skill governs all prose output — Claude's own responses, documentation, PR descriptions, commit messages, README content, comments, and any text the user asks to draft or edit. It should also be used when the user asks to "review my writing", "edit this for clarity", "make this clearer", "simplify this text", "rewrite this", "check my prose", "tighten this up", or "make this more concise". Based on George Orwell's "Politics and the English Language" (1946).

SKILL.mdUpdated Apr 10, 2026

aldengolab/orwell-clear-writing

aldengolab/k8s-hostnetwork-port-conflict

development

VerifiedTrustedCommunity

Debug Kubernetes pods using hostNetwork: true that crash with "Address already in use" or "failed to create listening socket for port N". Use when: (1) a hostNetwork pod container is in CrashLoopBackOff and logs show a port bind failure, (2) the port works fine in non-hostNetwork pods but fails with hostNetwork, (3) you need to identify which host-level process holds a port from within Kubernetes (no SSH). Covers /proc/net/udp inspection and kubectl debug node with nsenter.

SKILL.mdUpdated Apr 10, 2026

aldengolab/k8s-hostnetwork-port-conflict

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/aldengolab/lorist.git

# Copy into Claude Code skills folder (global)
cp -r lorist/skills/democratic-csi-nvmeof-attach-failure ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

aldengolab/lorist

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT