plugins/node-tuning/skills/scripts/SKILL.md
Generate tuned manifests and evaluate node tuning snapshots
npx skillsauth add openshift-eng/ai-helpers scriptsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Detailed instructions for invoking the helper utilities that back /node-tuning commands:
generate_tuned_profile.py renders Tuned manifests (tuned.openshift.io/v1).analyze_node_tuning.py inspects live nodes or sosreports for tuning gaps.python3 --version).plugins/node-tuning/skills/scripts/ are accessible.oc CLI when validating or applying manifests.oc CLI access plus a valid KUBECONFIG when capturing /proc//sys or sosreport via oc debug node/<name>. The sosreport workflow pulls the registry.redhat.io/rhel9/support-tools image (override with --toolbox-image or TOOLBOX_IMAGE) and requires registry access. HTTP(S) proxy env vars from the host are forwarded automatically when present, but using a proxy is optional.generate_tuned_profile.pyCollect Inputs
--profile-name: Tuned resource name.--summary: [main] section summary.--include, --main-option, --variable, --sysctl, --section (SECTION:KEY=VALUE).--machine-config-label key=value, --match-label key[=value].--priority (default 20), --namespace, --output, --dry-run.--list-nodes/--node-selector to inspect nodes and --label-node NODE:KEY[=VALUE] (plus --overwrite-labels) to tag machines.Inspect or Label Nodes (optional)
# List all worker nodes
python3 plugins/node-tuning/skills/scripts/generate_tuned_profile.py --list-nodes --node-selector "node-role.kubernetes.io/worker" --skip-manifest
# Label a specific node for the worker-hp pool
python3 plugins/node-tuning/skills/scripts/generate_tuned_profile.py \
--label-node ip-10-0-1-23.ec2.internal:node-role.kubernetes.io/worker-hp= \
--overwrite-labels \
--skip-manifest
Render the Manifest
python3 plugins/node-tuning/skills/scripts/generate_tuned_profile.py \
--profile-name "$PROFILE" \
--summary "$SUMMARY" \
--sysctl net.core.netdev_max_backlog=16384 \
--match-label tuned.openshift.io/custom-net \
--output .work/node-tuning/$PROFILE/tuned.yaml
--output to write <profile-name>.yaml in the current directory.--dry-run to print the manifest to stdout.Review Output
yq or open in an editor for readability.Validate and Apply
oc apply --server-dry-run=client -f <manifest>.oc apply -f <manifest>.ValueError with descriptive messages.--machine-config-label or --match-label) are supplied.python3 plugins/node-tuning/skills/scripts/generate_tuned_profile.py \
--profile-name realtime-worker \
--summary "Realtime tuned profile" \
--include openshift-node --include realtime \
--variable isolated_cores=1 \
--section bootloader:cmdline_ocp_realtime=+systemd.cpu_affinity=${not_isolated_cores_expanded} \
--machine-config-label machineconfiguration.openshift.io/role=worker-rt \
--priority 25 \
--output .work/node-tuning/realtime-worker/tuned.yaml
python3 plugins/node-tuning/skills/scripts/generate_tuned_profile.py \
--profile-name openshift-node-hugepages \
--summary "Boot time configuration for hugepages" \
--include openshift-node \
--section bootloader:cmdline_openshift_node_hugepages="hugepagesz=2M hugepages=50" \
--machine-config-label machineconfiguration.openshift.io/role=worker-hp \
--priority 30 \
--output .work/node-tuning/openshift-node-hugepages/hugepages-tuned-boottime.yaml
analyze_node_tuning.pyInspect either a live node (/proc, /sys) or an extracted sosreport snapshot for tuning signals (CPU isolation, IRQ affinity, huge pages, sysctl state, networking counters) and emit actionable recommendations.
python3 plugins/node-tuning/skills/scripts/analyze_node_tuning.py --format markdown
python3 plugins/node-tuning/skills/scripts/analyze_node_tuning.py \
--node worker-rt-0 \
--kubeconfig ~/.kube/prod \
--format markdown
python3 plugins/node-tuning/skills/scripts/analyze_node_tuning.py \
--node worker-rt-0 \
--toolbox-image registry.example.com/support-tools:latest \
--sosreport-arg "--case-id=01234567" \
--sosreport-output .work/node-tuning/sosreports \
--format json
python3 plugins/node-tuning/skills/scripts/analyze_node_tuning.py \
--sosreport /path/to/sosreport-2025-10-20
python3 plugins/node-tuning/skills/scripts/analyze_node_tuning.py \
--sosreport /path/to/sosreport \
--format json --output .work/node-tuning/node-analysis.json
--node <name> (with optional --kubeconfig / --oc-binary). By default the helper runs sosreport remotely from inside the RHCOS toolbox container (registry.redhat.io/rhel9/support-tools). Override the image with --toolbox-image, extend the sosreport command with --sosreport-arg, or disable the curated OpenShift flags via --skip-default-sosreport-flags. Pass --no-collect-sosreport to fall back to the direct /proc snapshot mode.--sosreport <dir> for archived diagnostics; detection finds embedded proc/ and sys/./proc and /sys).--proc-root or --sys-root when the layout differs.cpuinfo, kernel cmdline parameters (isolcpus, nohz_full, tuned.non_isolcpus), default IRQ affinities, huge page counters, sysctl values (net, vm, kernel), transparent hugepage settings, netstat/sockstat counters, and ps snapshots (when available in sosreport)./node-tuning:generate-tuned-profile to codify desired state.proc/ or sys/ directories trigger descriptive errors.# Node Tuning Analysis
## System Overview
- Hostname: worker-rt-1
- Kernel: 4.18.0-477.el8
- NUMA nodes: 2
- Kernel cmdline: `BOOT_IMAGE=... isolcpus=2-15 tuned.non_isolcpus=0-1`
## CPU & Isolation
- Logical CPUs: 32
- Physical cores: 16 across 2 socket(s)
- SMT detected: yes
- Isolated CPUs: 2-15
...
## Recommended Actions
- Configure net.core.netdev_max_backlog (>=32768) to accommodate bursty NIC traffic.
- Transparent Hugepages are not disabled (`[never]` not selected). Consider setting to `never` for latency-sensitive workloads.
- 4 IRQs overlap isolated CPUs. Relocate interrupt affinities using tuned profiles or irqbalance.
.work/node-tuning/<host>/analysis.json for historical tracing.testing
Snapshot OpenShift payload data (release controller, PR diffs, comments, CI jobs, JUnit results, regression tracking) to a local directory for offline analysis
research
Shared engine for analyzing Jira issue activity and generating status summaries
tools
This skill should be used before any Snowflake command to verify MCP connectivity, guide users through access provisioning, and set the session context. Invoke this skill proactively whenever a command needs Snowflake data access.
development
Analyze a payload snapshot to identify root causes of blocking job failures, score candidate PRs, and produce an HTML report with revert recommendations