skills/big-job/SKILL.md
Run long-running shell commands (builds, tests, data pipelines, scraping, training runs, etc.) as detached background jobs via systemd transient units, with journal-based output capture and lifecycle management. Use this skill whenever the user asks you to run something that will take more than a few seconds — large test suites, long builds, batch processing, data migrations, ML training, or any command the user explicitly wants running "in the background". Also use it when you need to run multiple heavy commands in parallel without blocking. Trigger on phrases like "run in background", "long running", "don't wait for it", "kick off a build", "run tests and come back", "start a training run", "batch process", or any task where blocking the conversation would be annoying.
npx skillsauth add ematvey/big-job big-jobInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Run heavy or long-running shell commands as detached systemd transient services. Each job gets its own user unit with journal-based output, exit code tracking, and OOM protection. All units use the bigjob- prefix for easy filtering.
Every job unit is named bigjob-YYYYMMDD-HHMMSS-jobname where:
$(date ...))jobname is a short kebab-case label describing the task (e.g. train, build-frontend, run-tests)Example: bigjob-20260305-143021-train
Use JOBNAME as shorthand for the full unit name in commands below.
By default, protect the machine with a hard cgroup memory cap: reserve 24 GiB for the OS, desktop, Docker, databases, and agent processes, then let the job use the rest. If the job exceeds the cap, systemd/kernel OOM-kills the job's unit instead of letting the whole machine thrash.
MEMORY_RESERVE_GB=24
TOTAL_MEMORY_GB=$(awk '/MemTotal/ {print int($2/1024/1024)}' /proc/meminfo)
MEMORY_MAX_GB=$((TOTAL_MEMORY_GB - MEMORY_RESERVE_GB))
[ "$MEMORY_MAX_GB" -lt 4 ] && MEMORY_MAX_GB=4
systemd-run --user \
--unit=JOBNAME \
--slice=bigjobs.slice \
--remain-after-exit \
-p MemoryAccounting=yes \
-p MemoryMax="${MEMORY_MAX_GB}G" \
-p MemorySwapMax=0 \
-p OOMPolicy=kill \
-p OOMScoreAdjust=500 \
-p StandardOutput=journal \
-p StandardError=journal \
--working-directory=/absolute/path/to/project \
-E PYTHONUNBUFFERED=1 \
-- command arg1 arg2
MemoryMax is a hard cgroup limit set to installed RAM minus MEMORY_RESERVE_GB; on a 122 GiB machine the default cap is about 98 GiBMemorySwapMax=0 prevents the job cgroup from using swap if swap exists; with swap disabled globally, this is still harmless and explicitOOMPolicy=kill makes memory failure kill the unit rather than leave children behindOOMScoreAdjust=500 biases kernel OOM selection toward big jobs over interactive/system processes--slice=bigjobs.slice groups heavy jobs together for easier inspection and future policy--remain-after-exit keeps the unit visible after the command finishes so we can check status and read outputjournalctlAfter starting, tell the user the unit name, then immediately start waiting for the job (see below). Never just start a job and stop — always follow up with the wait command in the same turn.
while [ "$(systemctl --user show -p SubState --value JOBNAME)" = "running" ]; do sleep 2; done; \
SUB=$(systemctl --user show -p SubState --value JOBNAME); \
RESULT=$(systemctl --user show -p Result --value JOBNAME); \
CODE=$(systemctl --user show -p ExecMainStatus --value JOBNAME); \
TAIL=$(journalctl --user -u JOBNAME -o cat --no-pager | tail -80); \
printf 'SubState=%s Result=%s ExecMainStatus=%s\n%s\n' "$SUB" "$RESULT" "$CODE" "$TAIL"
This polls until the command finishes, reports the final state/result/exit code, then dumps the last 80 lines of output. This must always be run as a background Bash call immediately after starting a job. The agent will be notified when the job finishes. If the job exceeds its memory cap, expect a nonzero status and often Result=oom-kill or an exit status such as 137.
systemctl --user show -p SubState -p Result -p ExecMainStatus -p MemoryCurrent -p MemoryPeak JOBNAME
Prints named properties:
SubState: running | exited | failedResult: success | exit-code | oom-kill | other systemd resultExecMainStatus: the exit code (0 = success)MemoryCurrent: current cgroup memory usage when availableMemoryPeak: peak cgroup memory usage when available# Last 50 lines
journalctl --user -u JOBNAME -o cat --no-pager | tail -50
# All output
journalctl --user -u JOBNAME -o cat --no-pager
# Follow (live stream)
journalctl --user -u JOBNAME -o cat -f
# Graceful (SIGTERM)
systemctl --user kill JOBNAME
# Force (SIGKILL)
systemctl --user kill -s KILL JOBNAME
# Stop entirely (also removes the transient unit)
systemctl --user stop JOBNAME
systemctl --user list-units 'bigjob-*' --all --no-pager
Shows all bigjob units with their state (running/exited/failed).
# Stop a specific finished unit (removes it since it's transient)
systemctl --user stop JOBNAME
# Stop all finished (exited) bigjob units
systemctl --user list-units 'bigjob-*' --all --no-pager --plain \
| awk '$4=="exited" {print $1}' \
| xargs -r systemctl --user stop
# Reset any failed units
systemctl --user reset-failed 'bigjob-*'
tail for output — output can be huge; don't dump everything unless neededMEMORY_RESERVE_GB when the desktop, databases, Docker, or vLLM services must stay responsive--working-directory with an absolute pathsystemctl --user list-units 'bigjob-*' to find jobs after restartdevelopment
Maintainer-only workflow for handling GitHub Secret Scanning alerts on OpenClaw. Use when Codex needs to triage, redact, clean up, and resolve secret leakage found in issue comments, issue bodies, PR comments, or other GitHub content.
development
Maintainer workflow for OpenClaw releases, prereleases, changelog release notes, and publish validation. Use when Codex needs to prepare or verify stable or beta release steps, align version naming, assemble release notes, check release auth requirements, or validate publish-time commands and artifacts.
development
Run, watch, debug, and extend OpenClaw QA testing with qa-lab and qa-channel. Use when Codex needs to execute the repo-backed QA suite, inspect live QA artifacts, debug failing scenarios, add new QA scenarios, or explain the OpenClaw QA workflow. Prefer the live OpenAI lane with regular openai/gpt-5.4 in fast mode; do not use gpt-5.4-pro or gpt-5.4-mini unless the user explicitly overrides that policy.
development
End-to-end Parallels smoke, upgrade, and rerun workflow for OpenClaw across macOS, Windows, and Linux guests. Use when Codex needs to run, rerun, debug, or interpret VM-based install, onboarding, gateway smoke tests, latest-release-to-main upgrade checks, fresh snapshot retests, or optional Discord roundtrip verification under Parallels.