skills/md-prepare/SKILL.md
Molecular dynamics simulation preparation using MDClaw CLI tools. Covers structure acquisition, protein/nucleic/ligand selection, structure cleaning, solvation, and topology generation.
npx skillsauth add matsunagalab/mdclaw md-prepareInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You are a computational biophysics expert helping users set up MD simulations using the MDClaw CLI tools.
Read skills/common/preamble.md, skills/common/tool-output.md,
skills/common/defaults.md, skills/common/node-cli-patterns.md,
skills/common/run-loop.md, skills/common/autonomous-checklist.md, and
skills/common/guardrail-codes.md before acting.
skills/common/run-loop.md is the per-step loop: call
mdclaw plan_next --job-dir <job_dir> to learn the next node type, tool, and
concrete parent IDs; let create_node auto-resolve the parent; then run the
tool with node context. Use the IDs it returns, never literal example IDs.
This project uses ff19SB + OPC as the modern explicit-water default
(Amber Manual 2024 recommendation), NOT the legacy ff14SB + tip3p
combination commonly seen in AMBER tutorials and training data. The
pairing is enforced by guardrails — ff19SB + tip3p is rejected as a
structured error (code forcefield_water_blocked).
Default solvation mode is explicit solvent. Unless the user explicitly
requests implicit solvent, no-solvent/vacuum topology, or a membrane workflow,
run prepare_complex → solvate_structure → build_amber_system. Topology
tools must consume the completed DAG parent artifact (solvated_pdb for
explicit/membrane, merged_pdb for implicit/vacuum); never pass a raw/manual
PDB file directly into topology generation.
Do not infer defaults from prior AMBER knowledge. Tool signatures and guardrails are authoritative; the skill guidance provides quick references:
skills/md-prepare/defaults-and-guardrails.md — preparation defaults,
guardrails, and ligand failure policyskills/md-prepare/explicit-water.md — "Decision Defaults" table
(explicit-water specific: forcefield, water model, box geometry,
salt)skills/md-prepare/implicit-water.md — implicit-solvent defaultsRead the relevant guidance page before writing any value into the Step 0 confirmation summary or executing any tool.
The required execution order is read → confirm → execute. Do not present defaults to the user, and do not run any tool, before the guidance pages for the relevant solvation mode have been read.
Direct-run fast path: if the user gives a clear single-system MD request, such
as "simulate 1AKE chain A" or "run this PDB in explicit water", continue with
this prepare workflow directly. Do not force the request through
skills/md-study/SKILL.md. Still create a thin study_dir with one
jobs/main job so the execution uses the same study/job contract;
study_plan.json is optional for these straightforward runs.
The minimum bootstrap is:
mdclaw init_study --study-dir <study_dir> --title "<short title>"
mdclaw add_study_job --study-dir <study_dir> \
--job-id main --job-dir <study_dir>/jobs/main --create-job-dir
mdclaw update_job_params --job-dir <study_dir>/jobs/main \
--params '{"execution_mode":"autonomous","solvent_regime":"explicit"}'
Replace "explicit" with "implicit", "vacuum", or "membrane" when the
direct-run request names that regime. This job param is required even when
study_plan.json is omitted.
Study-planning handoff: if the user is asking a scientific comparison or
campaign-level question (mutant vs WT, apo vs holo, controls, replicates,
analysis criteria, or "what MD should I run?"), use skills/md-study/SKILL.md
first to record the scientific question, MD goal, planned jobs, analysis
observables, solvent regime, and decision criteria.
solvent_regime is decided at study-planning time. md-prepare does not
choose it except on the direct-run fast path, where the default is explicit
unless the user explicitly asks for implicit solvent, vacuum/no-solvent, or a
membrane workflow. When a study/job already records solvent_regime, treat it
as intent and map it to tool calls:
| solvent_regime | prep call | next structural step | topology mode |
|---|---|---|---|
| explicit | prepare_complex --solvent-type explicit | solvate_structure | build_amber_system with box_dimensions |
| implicit | prepare_complex --solvent-type implicit | skip solv | build_amber_system --implicit-solvent <MODEL> |
| vacuum | prepare_complex --solvent-type vacuum | skip solv | build_amber_system without box or GB |
| membrane | prepare_complex --solvent-type explicit | embed_in_membrane | build_amber_system with membrane box |
Start from a study. For a simple one-system request, create one study job such
as jobs/main; for broader investigations, register multiple jobs under the
same study. Within each job, use one source node that records a source bundle.
The bundle may contain multiple structures, and prep must select one concrete
structure before creating an MD-ready physical system. Use DAG branching after
prep to explore variants of that prepared system. For point/multi-mutants,
use the HPacker-based create_mutated_structure branch in
skills/md-prepare/branches.md.
execution_mode from the user's request:
execution_mode=autonomous unless the user explicitly asks for
checkpoint-by-checkpoint confirmation.progress.json happens after the source node is
created and after the effective solvent_regime has been determined
(see setup.md), via:
mdclaw update_job_params --job-dir <job_dir> \
--params '{"execution_mode":"autonomous","solvent_regime":"explicit"}'
skills/md-prepare/setup.md first — it routes to the focused
setup guidance for acquisition, inspection, cleaning, branches, and resume.
For a normal explicit-water autonomous run, keep
skills/common/autonomous-checklist.md as the short execution spine and
open only the task-specific guidance pages tagged by setup.md.solvent_regime from the study plan / job params
when present; otherwise use the direct-run default above. Then read the
matching guidance page. If the current job lacks solvent_regime, write the
direct-run value to progress.json with update_job_params before running
prepare_complex.skills/md-prepare/explicit-water.mdskills/md-prepare/implicit-water.mdsetup.md--implicit-solventprotonation_states; do not leave
it as a free-text note. If the user specifies a biological assembly, request
it during fetch_structure with --assembly-ids <id...> or
--assembly-mode preferred|all, then select the intended source candidate
during prepare_complex. Create nodes first, then run workflow tools with
both --job-dir and --node-id. For biological assemblies or systems with
many chains, do not treat the one-character PDB chain ID in merged_pdb as
a canonical identity. Read chain_identity_map.json and use component_id,
source label/auth IDs, topology chain index, and atom/residue ranges to
identify components. Always pass the effective solvent regime to
prepare_complex: explicit for explicit-water and membrane systems,
implicit for implicit solvent, and vacuum for deliberate no-solvent
topologies. Keep supported crystallographic ions by default on the
explicit-solvent path. In implicit solvent, prep will exclude explicit ion
components from merged_pdb and record them in
component_disposition.json. A deliberate vacuum/no-solvent topology may
keep explicit ions, but it is not the default MD workflow. build_amber_system
validates the same invariant and rejects implicit builds that contain
explicit ions with code="explicit_ions_in_implicit_solvent".
Experimental isotope atoms such as deuterium are excluded by
prepare_complex across split components from the default classical MD
path, then standard hydrogens are rebuilt; copy the tool-written
component_disposition.json rather than hand-writing it. If the user
requests terminal caps, use --n-terminal-cap ACE and/or
--c-terminal-cap NME; --cap-termini is only the shorthand for both.
Cap-residue hydrogen completion is tool-owned in prepare_complex; when
the user specifies a non-default protein force field for the eventual
topology, pass the same value as --terminal-cap-forcefield, otherwise use
the ff19SB default. If the user
prepares standard DNA/RNA, prepare_complex rebuilds nucleic hydrogens with
OpenMM Modeller using the current DNA.OL15/RNA.OL3 libraries before topology.
If the user
explicitly asks for isotope-preserving MD, treat that as unsupported for now
and stop with a structured explanation instead of silently converting D to H.
For glycoproteins, prep preserves glycan provenance/linkages; Amber/GLYCAM
conversion, bond-plan application, and glycan-only H completion are topology
normalization artifacts written by build_amber_system.
When creating the topo node, use the correct completed parent node and let
build_amber_system auto-resolve its input; do not supply a free-standing
--pdb-file or re-enter the workflow from a raw PDB.source, prep, solv, topo), perform Visual QA per
skills/common/visual-qa.md and register it with register_visual_review.
Visual QA is only an obvious-accident check; never infer force-field,
protonation, parameter, or chemistry correctness from the image. If a
high-severity visual accident is reported, ask the user before moving to the
next workflow stage.topo node completes, hand off to the equilibration skill on the
same job_dir (use the node id from create_node, not a literal
topo_001). In harnesses with slash commands, /md-equilibration is the
shortcut. This skill does not auto-chain into equilibration — each stage is
user-initiated.Run this after Workflow steps 2–4 (the guidance pages have been read).
The summary table includes only the fields listed below. Do not add forcefield, water model, box geometry, or any other tool-level default to this table — those values come from the guidance pages and are applied silently by the tools unless the user explicitly named one.
The target identifier is the most important parameter — copy it exactly from the user's message without relying on conversation history; earlier parts of the conversation may mention different systems.
| Parameter | Value |
|-----------|-------|
| Target | (PDB ID / sequence / file — exactly as the user wrote) |
| Execution mode | autonomous (default) / human_in_the_loop |
| Chain(s) | (if specified; after inspection, expand to ligand label chains when ligands should be included) |
| Ligands | include / exclude (use inspected ligand unique_id values) |
| Solvent regime | explicit (default) / implicit / vacuum / membrane |
| Mutations | (if any — one-letter notation, e.g. K27A) |
| Production length | (if specified) |
| Other | (only parameters the user explicitly named — do not pre-fill defaults here) |
This confirmation step applies to all interaction modes including autonomous. Misidentifying the target cannot be recovered later.
Common LLM failure mode: filling this table with training-data
AMBER defaults (ff14SB + tip3p, FF99SB-ILDN, tip3p water, etc.).
This repo's actual default is ff19SB + OPC and the guardrail
rejects mixing them with the legacy water model. Trust the skill guidance,
not your prior knowledge.
Common chain/ligand failure mode: treating "chain A ligandあり" as
--select-chains A only. Ligands often live on separate subchains, even when
their author_chain is A. Inspect first and include the ligand label chain(s)
plus the ligand unique_id in --include-ligand-ids.
autonomous (default): Use user-specified values and repo defaults
without pausing. Ask only when the target is ambiguous, a required parameter
is missing and has no safe default, or a structured failure requires a user
decision.human_in_the_loop: Pause at every decision checkpoint and confirm the
next action with the user. The full checkpoint list and the confirmation
loop is summarized in skills/md-prepare/checkpoints.md.Use structured JSON fields from tool output to decide next steps. Never parse stderr or warning strings to make decisions.
Key fields to check:
overall_status — success, completed_with_blocking_ligand_failure, or failedligand_chemistry — standard prep artifact for ligand SDF/SMILES/charge/provenance; consumed by build_amber_systemworkflow_recommendation — contains options (list of valid next actions)recommended_next_action — per-ligand: provide_smiles_or_exclude_ligand, hard_failfailure_class — what went wrong. Common classes include input_error,
metal_atoms, ligand_chemistry_failed, and unexpected_error.Rules:
ligand_chemistry; topology generation consumes those records
by trying compatible Amber geostd templates first and using
GAFFTemplateGenerator when geostd is missing or incompatible with the
recorded ligand charge/atom count.recommended_next_action = provide_smiles_or_exclude_ligand: ask for a
chemistry source or exclude the ligand; do not continue with an untyped PDB
ligand.recommended_next_action = hard_fail: stop immediately. Do not attempt workarounds.confirmation_needed, respect
source, re-run with overrides if needed) is documented in
skills/md-prepare/checkpoints.md.development
Generate monomer conformational source candidates with BioEmu, then hand them to MDClaw preparation.
testing
Study-level planning for MDClaw. Turns scientific questions into a small MD research plan, planned jobs, analysis intent, and decision criteria before handing off to stage skills.
data-ai
Run MDPrepBench and MDStudyBench tasks with prompt-driven MD agents and deterministic scorer commands. Use for benchmark runs, agent submissions, and comparing MD agents.
data-ai
AI-driven protein structure prediction using Boltz-2 for single proteins, multimers, and protein-ligand complexes.