skills/pxe-grub-persistent-server-pattern/SKILL.md
Design pattern for running a persistent PXE/TFTP server that safely coexists with already-installed nodes. Use when: building PXE infrastructure that should stay always-on, designing automated bare-metal provisioning in GitOps/Kubernetes environments, or any PXE setup where UEFI boot order has network boot first. Eliminates boot loops without requiring UEFI firmware changes.
npx skillsauth add aldengolab/lorist pxe-grub-persistent-server-patternInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
PXE servers typically require manual lifecycle management -- scale up to provision, scale down after. If left running, installed nodes re-enter the installer on reboot because UEFI boot order has network boot first.
Make the GRUB menu default to an exit entry that returns control to UEFI firmware,
which then boots from the next entry (local disk):
set timeout=5
set default=0
menuentry "Boot from local disk" {
exit
}
menuentry "Install Ubuntu" {
linux /vmlinuz ...
initrd /initrd
}
Installed node reboots
-> UEFI: PXE boot (first in order)
-> dnsmasq: responds with BOOTx64.EFI
-> shim -> GRUB -> shows menu
-> 5s timeout -> "Boot from local disk" (default)
-> GRUB exit -> UEFI: next boot entry -> local disk
-> Ubuntu boots normally
For new nodes, an operator selects "Install Ubuntu" during the 5s window.
efibootmgr modifies UEFI NVRAM to change boot order. In practice:
curtin in-target chroot doesn't mount /sys/firmware/efi/efivarsThe GRUB exit pattern works at the bootloader level, which is firmware-agnostic.
With this pattern, the PXE server can run permanently:
replicas: 1 in Helm values (not 0)kubectl scale needed before/after provisioningWhen things go wrong, read references/troubleshooting.md for symptom-based diagnostics:
http module and auto-select based on MAC statusdevelopment
Build a UEFI Secure Boot PXE netboot server for Ubuntu autoinstall. Use when: designing or implementing network boot infrastructure for automated Ubuntu provisioning with Secure Boot enabled. Covers the complete chain: signed shim+GRUB selection, TFTP layout, kernel parameters, autoinstall config requirements, and post-install bootstrapping scripts. Also applicable when debugging an existing PXE setup that uses the wrong GRUB binary or config paths.
development
This skill governs all prose output — Claude's own responses, documentation, PR descriptions, commit messages, README content, comments, and any text the user asks to draft or edit. It should also be used when the user asks to "review my writing", "edit this for clarity", "make this clearer", "simplify this text", "rewrite this", "check my prose", "tighten this up", or "make this more concise". Based on George Orwell's "Politics and the English Language" (1946).
development
Debug Kubernetes pods using hostNetwork: true that crash with "Address already in use" or "failed to create listening socket for port N". Use when: (1) a hostNetwork pod container is in CrashLoopBackOff and logs show a port bind failure, (2) the port works fine in non-hostNetwork pods but fails with hostNetwork, (3) you need to identify which host-level process holds a port from within Kubernetes (no SSH). Covers /proc/net/udp inspection and kubectl debug node with nsenter.
development
Adversarial 3-stage review of an implementation plan before execution — checks completeness, security/best-practices, and multi-agent implementability. Use this skill whenever the user wants to validate, review, stress-test, or check an existing implementation plan before building or executing it. Trigger on phrases like "review the plan", "stress-test the plan", "is this plan ready", "check the plan", "validate the plan", "go through the plan", "find what's wrong with the plan", "is this ready for agents", "can subagents execute this", or any request to find gaps, security issues, or underspecified tasks in a plan that has already been written. Also trigger when the user has just finished /plan-implementation and is about to run /autonomous-plan-execution — the plan should be reviewed first. This skill is specifically for reviewing EXISTING plans, not for writing new ones (use plan-implementation for that), not for code review or PR review (use code-review for that), and not for debugging or post-deployment issues. If the user mentions a plan file, says "before we build", or asks whether phases can be parallelized, this skill almost certainly applies.