skills/hpc-deployment/slurm-job-script-generator/SKILL.md
Generate correct, copy-pasteable SLURM sbatch job scripts and sanity-check HPC resource requests — configure nodes, MPI tasks, OpenMP threads, memory (per-node or per-cpu), GPUs, walltime, partitions, modules, and environment variables, with automatic detection of conflicting directives and oversubscription. Use when preparing a SLURM submission script, deciding between pure MPI and hybrid MPI+OpenMP layouts, standardizing #SBATCH directives across a team, debugging why a job won't launch or gets killed, or setting up GPU-accelerated simulation jobs, even if the user only says "I need to run this on the cluster" or "my job keeps getting killed."
npx skillsauth add HeshamFS/materials-simulation-skills slurm-job-script-generatorInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Generate a correct, copy-pasteable SLURM job script (.sbatch) for running a simulation, and surface common configuration mistakes (bad walltime format, conflicting memory flags, oversubscription hints).
| Input | Description | Example |
|-------|-------------|---------|
| Job name | Short identifier for the job | phasefield-strong-scaling |
| Walltime | SLURM time limit | 00:30:00 |
| Partition | Cluster partition/queue (if required) | compute |
| Account | Project/account (if required) | matsim |
| Nodes | Number of nodes to allocate | 2 |
| MPI tasks | Total tasks, or tasks per node | 128 or 64 per node |
| Threads | CPUs per task (OpenMP threads) | 2 |
| Memory | --mem or --mem-per-cpu (cluster policy dependent) | 32G |
| GPUs | GPUs per node (optional) | 4 |
| Working directory | Where the run should execute | $SLURM_SUBMIT_DIR |
| Modules | Environment modules to load (optional) | gcc/12, openmpi/4.1 |
| Run command | The command to launch under SLURM | ./simulate --config cfg.json |
Does the code use OpenMP / threading?
├── NO → Use MPI-only: cpus-per-task=1
└── YES → Use hybrid: set cpus-per-task = threads per MPI rank
and export OMP_NUM_THREADS = cpus-per-task
Rule of thumb: if you see diminishing strong-scaling efficiency at high MPI ranks, try fewer ranks with more threads per rank (and measure).
--mem (per node) or --mem-per-cpu (per CPU), not both.--mem units are integer MB by default, or an integer with suffix K/M/G/T (and --mem=0 commonly means “all memory on node”).| Script | Key Outputs |
|--------|-------------|
| scripts/slurm_script_generator.py | results.script, results.directives, results.derived, results.warnings |
slurm_script_generator.py.job.sbatch.sbatch job.sbatch and monitor with squeue.# Preview a job script (prints to stdout)
python3 skills/hpc-deployment/slurm-job-script-generator/scripts/slurm_script_generator.py \
--job-name phasefield \
--time 00:10:00 \
--partition compute \
--nodes 1 \
--ntasks-per-node 8 \
--cpus-per-task 2 \
--mem 16G \
--module gcc/12 \
--module openmpi/4.1 \
-- \
./simulate --config config.json
# Write to a file and also emit structured JSON
python3 skills/hpc-deployment/slurm-job-script-generator/scripts/slurm_script_generator.py \
--job-name phasefield \
--time 00:10:00 \
--nodes 1 \
--ntasks 16 \
--cpus-per-task 1 \
--out job.sbatch \
--json \
-- \
/bin/echo hello
User: I need an sbatch script for my MPI simulation. I want 2 nodes, 64 ranks per node, 2 OpenMP threads per rank, and 2 hours.
Agent workflow:
python3 scripts/slurm_script_generator.py --job-name run --time 02:00:00 --nodes 2 --ntasks-per-node 64 --cpus-per-task 2 -- -- ./simulate
OMP_NUM_THREADS=2)--cores-per-node.| Error | Cause | Resolution |
|-------|-------|------------|
| time must be HH:MM:SS or D-HH:MM:SS | Bad walltime format | Use 00:30:00 or 1-00:00:00 |
| nodes must be positive | Non-positive nodes | Provide --nodes >= 1 |
| Provide either --mem or --mem-per-cpu, not both | Conflicting memory directives | Choose one memory style |
| Provide a run command after -- | Missing launch command | Add -- ./simulate ... |
--time is validated against strict HH:MM:SS or D-HH:MM:SS format via regex--nodes, --ntasks, --ntasks-per-node, --cpus-per-task, --gpus are validated as positive integers with upper bounds--mem and --mem-per-cpu are validated against SLURM's accepted format (<int>[K|M|G|T]); providing both simultaneously is rejected--job-name is validated against [a-zA-Z0-9_.-]+ (no shell metacharacters)--partition and --account are validated against safe-character allowlists--module values are validated to prevent shell injection (no ;, |, &, backticks, or $)--out writes the generated sbatch script to a single specified file path#SBATCH directives; it contains no dynamically generated codeslurm_script_generator.py with explicit argument lists; the generated script itself is NOT executed by the agent.sbatch file; writes are scoped to the user's working directoryeval(), exec(), or dynamic code generationshell=True)--) is included verbatim in the generated script but is never executed by the skill itselfmodule load directivesset -euo pipefail for safe shell execution on the clusterreferences/slurm_directives.md - Common #SBATCH directives and mapping tipsdevelopment
Plan verification and validation campaigns for simulation codes using manufactured solutions, canonical benchmark problems, grid/time refinement, uncertainty propagation, and pass/fail acceptance criteria. Use when an agent needs to prove a solver, model, or result is trustworthy rather than only plausible.
testing
Map computational materials tasks onto workflow engines such as atomate2, jobflow, AiiDA, pyiron, or a simple one-off script. Use when deciding how to structure a reproducible campaign, DAG, restart strategy, provenance record, storage layout, or migration path from ad hoc scripts to managed workflows.
development
Plan molecular dynamics post-processing for materials simulations, including RDF, MSD and diffusion, VACF/VDOS, coordination numbers, bond-angle distributions, stress-strain curves, equilibration detection, PBC unwrapping, and trajectory format choices. Use before writing MD analysis scripts or trusting trajectory-derived results.
development
Triage cross-code simulation failures and propose safe retry ladders for nonconvergence, NaN/Inf, exploding energies, unstable timesteps, pressure blow-up, missing potentials, bad pseudopotentials, corrupted output, and incomplete runs. Use when an agent sees a failed or suspicious materials simulation and needs a defensible first response.