Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

HeshamFS/hpc-runtime-doctor

Name: hpc-runtime-doctor
Author: HeshamFS

skills/hpc-deployment/hpc-runtime-doctor/SKILL.md

npx skillsauth add HeshamFS/materials-simulation-skills hpc-runtime-doctor

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

HPC Runtime Doctor

Goal

Turn cluster symptoms into a resource-layout diagnosis, environment checklist, and safe retry plan.

Requirements

Python 3.10+
No external dependencies
Works on Linux, macOS, and Windows

Inputs to Gather

| Input | Description | Example | |-------|-------------|---------| | Scheduler | SLURM, PBS, LSF, local | slurm | | Nodes/tasks/threads | Runtime layout | 2 nodes, 128 tasks, 2 threads | | GPUs | GPUs requested | 4 | | Symptoms | Observed failure | oom,killed,slow-gpu | | MPI/OpenMP/GPU use | Parallel modes | mpi+openmp+gpu | | Walltime | Requested time | 12:00:00 | | Scratch | Whether scratch is used | true |

Decision Guidance

Check resource layout before changing physics settings.
Confirm module/compiler/MPI/CUDA consistency before debugging solver behavior.
Treat missing restart files and scratch cleanup as workflow failures, not physics failures.
For GPU jobs, confirm the executable was built with the requested accelerator backend.

Script Outputs

scripts/hpc_runtime_doctor.py emits:

resource_layout
diagnoses
environment_checks
retry_plan
scheduler_notes

Workflow

python3 skills/hpc-deployment/hpc-runtime-doctor/scripts/hpc_runtime_doctor.py \
  --scheduler slurm \
  --nodes 2 \
  --tasks 128 \
  --cpus-per-task 2 \
  --gpus 4 \
  --symptoms oom,slow-gpu \
  --uses-mpi \
  --uses-openmp \
  --uses-gpu \
  --json

Error Handling

Invalid resource counts stop with exit code 2. Unknown symptoms are preserved as custom items for human review.

Limitations

This skill does not query a live scheduler. It diagnoses from the submitted layout and symptoms.

Security

Inputs are scalar CLI values and booleans only.
The script does not execute scheduler commands or inspect environment variables.
The skill uses Bash only to run its bundled script.

References

See references/hpc_runtime_patterns.md for scheduler and runtime diagnosis patterns.

Version History

1.0.0: Initial HPC runtime diagnosis skill.

HeshamFS/hpc-runtime-doctor

skills/hpc-deployment/hpc-runtime-doctor/SKILL.md

Diagnose HPC runtime and scheduler problems for materials simulations, including MPI/OpenMP/GPU layout, modules, CUDA/Kokkos hints, scratch paths, walltime, job arrays, restart strategy, scheduler portability, and resource mismatch. Use when jobs fail, run slowly, get killed, or behave differently on a cluster than on a workstation.

39 stars

documentation

Updated May 19, 2026

$ install --global

skillsauth

npx skillsauth add HeshamFS/materials-simulation-skills hpc-runtime-doctor

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 19, 2026, 5:29 AM55.9s5 files scanned

SKILL.md

name:: hpc-runtime-doctor
description:: >
allowed-tools:: Read, Bash, Write, Grep, Glob
author:: HeshamFS
version:: 1.0.0
security_tier:: high
security_reviewed:: true
eval_cases:: 3
last_reviewed:: 2026-05-18

HPC Runtime Doctor

Goal

Turn cluster symptoms into a resource-layout diagnosis, environment checklist, and safe retry plan.

Requirements

Python 3.10+
No external dependencies
Works on Linux, macOS, and Windows

Inputs to Gather

Decision Guidance

Check resource layout before changing physics settings.
Confirm module/compiler/MPI/CUDA consistency before debugging solver behavior.
Treat missing restart files and scratch cleanup as workflow failures, not physics failures.
For GPU jobs, confirm the executable was built with the requested accelerator backend.

Script Outputs

scripts/hpc_runtime_doctor.py emits:

resource_layout
diagnoses
environment_checks
retry_plan
scheduler_notes

Workflow

python3 skills/hpc-deployment/hpc-runtime-doctor/scripts/hpc_runtime_doctor.py \
  --scheduler slurm \
  --nodes 2 \
  --tasks 128 \
  --cpus-per-task 2 \
  --gpus 4 \
  --symptoms oom,slow-gpu \
  --uses-mpi \
  --uses-openmp \
  --uses-gpu \
  --json

Error Handling

Invalid resource counts stop with exit code 2. Unknown symptoms are preserved as custom items for human review.

Limitations

This skill does not query a live scheduler. It diagnoses from the submitted layout and symptoms.

Security

Inputs are scalar CLI values and booleans only.
The script does not execute scheduler commands or inspect environment variables.
The skill uses Bash only to run its bundled script.

References

See references/hpc_runtime_patterns.md for scheduler and runtime diagnosis patterns.

Version History

1.0.0: Initial HPC runtime diagnosis skill.

Related Skills

HeshamFS/benchmark-and-mms-planner

development

VerifiedTrustedCommunity

Plan verification and validation campaigns for simulation codes using manufactured solutions, canonical benchmark problems, grid/time refinement, uncertainty propagation, and pass/fail acceptance criteria. Use when an agent needs to prove a solver, model, or result is trustworthy rather than only plausible.

39SKILL.mdUpdated May 19, 2026

HeshamFS/benchmark-and-mms-planner

HeshamFS/workflow-engine-mapper

testing

VerifiedTrustedCommunity

Map computational materials tasks onto workflow engines such as atomate2, jobflow, AiiDA, pyiron, or a simple one-off script. Use when deciding how to structure a reproducible campaign, DAG, restart strategy, provenance record, storage layout, or migration path from ad hoc scripts to managed workflows.

39SKILL.mdUpdated May 19, 2026

HeshamFS/workflow-engine-mapper

HeshamFS/md-analysis-planner

development

VerifiedTrustedCommunity

Plan molecular dynamics post-processing for materials simulations, including RDF, MSD and diffusion, VACF/VDOS, coordination numbers, bond-angle distributions, stress-strain curves, equilibration detection, PBC unwrapping, and trajectory format choices. Use before writing MD analysis scripts or trusting trajectory-derived results.

39SKILL.mdUpdated May 19, 2026

HeshamFS/md-analysis-planner

HeshamFS/simulation-failure-triage

development

VerifiedTrustedCommunity

Triage cross-code simulation failures and propose safe retry ladders for nonconvergence, NaN/Inf, exploding energies, unstable timesteps, pressure blow-up, missing potentials, bad pseudopotentials, corrupted output, and incomplete runs. Use when an agent sees a failed or suspicious materials simulation and needs a defensible first response.

39SKILL.mdUpdated May 19, 2026

HeshamFS/simulation-failure-triage

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/HeshamFS/materials-simulation-skills.git

# Copy into Claude Code skills folder (global)
cp -r materials-simulation-skills/skills/hpc-deployment/hpc-runtime-doctor ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

HeshamFS/materials-simulation-skills

39 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT