Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

HeshamFS/benchmark-and-mms-planner

Name: benchmark-and-mms-planner
Author: HeshamFS

skills/verification-validation/benchmark-and-mms-planner/SKILL.md

npx skillsauth add HeshamFS/materials-simulation-skills benchmark-and-mms-planner

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Benchmark And MMS Planner

Goal

Design a verification and validation plan before trusting simulation results. The skill helps agents choose manufactured solutions, benchmark cases, refinement protocols, uncertainty checks, and pass/fail criteria.

Requirements

Python 3.10+
No external dependencies
Works on Linux, macOS, and Windows

Inputs to Gather

| Input | Description | Example | |-------|-------------|---------| | PDE or model class | Governing family | diffusion, elasticity, phase-field | | Quantity of interest | Metric to validate | interface velocity, L2 temperature error | | Dimension | 1, 2, or 3 | 2 | | Expected order | Formal discretization order | 2 | | Reference availability | Analytic, benchmark, or none | analytic | | Risk level | Cost or consequence of wrong result | high |

Decision Guidance

Use MMS when code correctness is uncertain and an analytic solution can be injected.
Use canonical benchmarks when physical model validation matters more than code verification.
Use grid/time refinement whenever the result is used for a claim, design decision, or comparison.
Use uncertainty propagation when inputs are calibrated, noisy, or experimentally measured.

Script Outputs

scripts/benchmark_mms_planner.py emits inputs and results with:

verification_strategy
mms_plan
benchmark_cases
refinement_protocol
acceptance_criteria
warnings

Workflow

Collect the governing model, quantity of interest, and risk level.
Run benchmark_mms_planner.py --json.
Treat warnings as blockers for high-risk claims.
Convert the returned protocol into tests, simulation runs, or review checklist items.

python3 skills/verification-validation/benchmark-and-mms-planner/scripts/benchmark_mms_planner.py \
  --model diffusion \
  --quantity "L2 error in temperature" \
  --dimension 2 \
  --expected-order 2 \
  --reference analytic \
  --risk high \
  --json

Error Handling

If the dimension or expected order is invalid, stop and correct the model description.
If no reference exists, use conservation and convergence checks but do not call the result validated.

Limitations

This skill plans verification work; it does not run the solver or prove that a physical model is appropriate for an experiment.

Security

Inputs are scalar strings and finite numeric values only.
The script does not execute external solvers.
File writes are not performed.
The skill uses Bash only to run its bundled script.

References

See references/vv_patterns.md for MMS, benchmark, and uncertainty planning notes.

Version History

1.0.0: Initial benchmark and MMS planning skill.

HeshamFS/benchmark-and-mms-planner

skills/verification-validation/benchmark-and-mms-planner/SKILL.md

Plan verification and validation campaigns for simulation codes using manufactured solutions, canonical benchmark problems, grid/time refinement, uncertainty propagation, and pass/fail acceptance criteria. Use when an agent needs to prove a solver, model, or result is trustworthy rather than only plausible.

39 stars

development

Updated May 19, 2026

$ install --global

skillsauth

npx skillsauth add HeshamFS/materials-simulation-skills benchmark-and-mms-planner

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 19, 2026, 5:33 AM111.2s5 files scanned

SKILL.md

name:: benchmark-and-mms-planner
description:: >
allowed-tools:: Read, Bash, Write, Grep, Glob
author:: HeshamFS
version:: 1.0.0
security_tier:: high
security_reviewed:: true
eval_cases:: 3
last_reviewed:: 2026-05-18

Benchmark And MMS Planner

Goal

Requirements

Python 3.10+
No external dependencies
Works on Linux, macOS, and Windows

Inputs to Gather

Decision Guidance

Use MMS when code correctness is uncertain and an analytic solution can be injected.
Use canonical benchmarks when physical model validation matters more than code verification.
Use grid/time refinement whenever the result is used for a claim, design decision, or comparison.
Use uncertainty propagation when inputs are calibrated, noisy, or experimentally measured.

Script Outputs

scripts/benchmark_mms_planner.py emits inputs and results with:

verification_strategy
mms_plan
benchmark_cases
refinement_protocol
acceptance_criteria
warnings

Workflow

Collect the governing model, quantity of interest, and risk level.
Run benchmark_mms_planner.py --json.
Treat warnings as blockers for high-risk claims.
Convert the returned protocol into tests, simulation runs, or review checklist items.

python3 skills/verification-validation/benchmark-and-mms-planner/scripts/benchmark_mms_planner.py \
  --model diffusion \
  --quantity "L2 error in temperature" \
  --dimension 2 \
  --expected-order 2 \
  --reference analytic \
  --risk high \
  --json

Error Handling

If the dimension or expected order is invalid, stop and correct the model description.
If no reference exists, use conservation and convergence checks but do not call the result validated.

Limitations

This skill plans verification work; it does not run the solver or prove that a physical model is appropriate for an experiment.

Security

Inputs are scalar strings and finite numeric values only.
The script does not execute external solvers.
File writes are not performed.
The skill uses Bash only to run its bundled script.

References

See references/vv_patterns.md for MMS, benchmark, and uncertainty planning notes.

Version History

1.0.0: Initial benchmark and MMS planning skill.

Related Skills

HeshamFS/workflow-engine-mapper

testing

VerifiedTrustedCommunity

Map computational materials tasks onto workflow engines such as atomate2, jobflow, AiiDA, pyiron, or a simple one-off script. Use when deciding how to structure a reproducible campaign, DAG, restart strategy, provenance record, storage layout, or migration path from ad hoc scripts to managed workflows.

39SKILL.mdUpdated May 19, 2026

HeshamFS/workflow-engine-mapper

HeshamFS/md-analysis-planner

development

VerifiedTrustedCommunity

Plan molecular dynamics post-processing for materials simulations, including RDF, MSD and diffusion, VACF/VDOS, coordination numbers, bond-angle distributions, stress-strain curves, equilibration detection, PBC unwrapping, and trajectory format choices. Use before writing MD analysis scripts or trusting trajectory-derived results.

39SKILL.mdUpdated May 19, 2026

HeshamFS/md-analysis-planner

HeshamFS/simulation-failure-triage

development

VerifiedTrustedCommunity

Triage cross-code simulation failures and propose safe retry ladders for nonconvergence, NaN/Inf, exploding energies, unstable timesteps, pressure blow-up, missing potentials, bad pseudopotentials, corrupted output, and incomplete runs. Use when an agent sees a failed or suspicious materials simulation and needs a defensible first response.

39SKILL.mdUpdated May 19, 2026

HeshamFS/simulation-failure-triage

HeshamFS/hpc-runtime-doctor

documentation

VerifiedTrustedCommunity

Diagnose HPC runtime and scheduler problems for materials simulations, including MPI/OpenMP/GPU layout, modules, CUDA/Kokkos hints, scratch paths, walltime, job arrays, restart strategy, scheduler portability, and resource mismatch. Use when jobs fail, run slowly, get killed, or behave differently on a cluster than on a workstation.

39SKILL.mdUpdated May 19, 2026

HeshamFS/hpc-runtime-doctor

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/HeshamFS/materials-simulation-skills.git

# Copy into Claude Code skills folder (global)
cp -r materials-simulation-skills/skills/verification-validation/benchmark-and-mms-planner ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

HeshamFS/materials-simulation-skills

39 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT