skills/model-incorporation-code/SKILL.md
Integrate the code of a new ML model into an Ersilia model template repository. Use this skill whenever the user has: (1) a source model repository with the original ML code, (2) an ersilia-model-template repository (already forked/cloned), and (3) optionally a PDF of the scientific article — and needs to wire up the actual model code. The skill handles all coding steps: copying checkpoints with git-lfs tracking, adapting main.py to replace the molecular-weight placeholder with real inference, creating run_columns.csv, updating install.yml with pinned versions, and producing run_input.csv and run_output.csv by actually running the model. Trigger on phrases like "incorporate model code", "fill in the template", "adapt main.py", "add checkpoints", "create run_columns", "create install.yml for ersilia", "generate example files for ersilia model", or any request to wire a source model into the eos-template format.
npx skillsauth add ersilia-os/claude-ersilia-skills model-incorporation-codeInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
You integrate a published computational model into an Ersilia model template by reading the source code, understanding the inference pipeline, and producing all the required files. Ersilia supports three model types:
See references/template-structure.md for detailed specs on each file, and
references/main-py-patterns.md for annotated main.py examples by model type.
--template <template-repo-path> (required): local path to the cloned ersilia-model-template fork--source <source-model-repo-path> (required): local path to the cloned source model repository--paper <pdf-path> (optional): path to the PDF of the scientific articleIf arguments are missing, ask the user to provide them before proceeding.
Read everything before touching a file. The goal is to build a complete mental model of what the source model does and how to run it.
Extract:
This is the most important phase. A thorough understanding of the source code is the foundation for everything that follows — shortcuts here cause every subsequent step to go wrong.
Systematically explore:
README.md — usage instructions, CLI entry point, example commandsrequirements.txt / setup.py / pyproject.toml / environment.yml — dependenciespredict(), __call__(), forward().pt, .pkl, .h5, .joblib, .ckpt)Ask the user whenever something is ambiguous or unclear — better to clarify early than to make wrong assumptions that affect the whole integration.
Open the template's model/framework/code/main.py to see the placeholder structure.
Also read install.yml, .gitattributes, and model/framework/run.sh so you understand
the wiring before writing anything.
<template-repo-path>/model/checkpoints/. If they are not in the repo
but linked externally (Zenodo, Figshare, HuggingFace, Google Drive, direct URL),
download them using wget, curl, or the appropriate Python client
(e.g. huggingface_hub, gdown).git lfs install once if not already done..gitattributes (e.g. *.pt filter=lfs diff=lfs merge=lfs -text).git lfs track "<pattern>" or manually add the entry..gitattributes with git add .gitattributes.Open <template-repo-path>/model/framework/code/main.py. Replace the my_model
placeholder function (which calculates molecular weight) with a function that:
../../checkpoints/ (relative to the script location — do NOT use absolute paths).All new models must use ersilia_pack_utils for CSV I/O — do not use manual
csv.reader / csv.writer. The standard pattern is:
from ersilia_pack_utils.core import read_smiles, write_out
_, smiles_list = read_smiles(input_file)
# ... run inference ...
write_out(outputs, headers, output_file, np.float32)
Copy any additional helper .py files from the source model that are needed (e.g.
preprocessing utilities, model class definitions) into
<template-repo-path>/model/framework/code/.
See references/main-py-patterns.md for annotated patterns organised by model type.
Important checks before finishing main.py:
main.py runs without errors on a single SMILES first.None or a sentinel value).Create <template-repo-path>/model/framework/columns/run_columns.csv.
Write it with Python or a plain text editor without BOM encoding. The file must have exactly these four columns (no extras):
name,type,direction,description
Rules (details in references/template-structure.md):
smi_ + zero-padded index (padding width = digit count of the maximum index, i.e.
total count − 1; e.g. smi_00 for 100 outputs since max index = 99 has 2 digits,
smi_000 for 1000 outputs since max index = 999 has 3 digits); representation/featurisation
outputs: feat_ + zero-padded index using the same padding rule (e.g. feat_00 for
100 dims, feat_000 for 512 dims, feat_0000 for 2048 dims); single-value predictors: a meaningful name like
logp or activity_score. Note: many older Ersilia models use dim_ instead of
feat_ — that is historical; all new incorporations must use feat_.float, integer, or string — nothing else.high or low — the direction of biological activity. high
means higher output values correspond to more of the modelled property (e.g. a
higher probability score means the molecule is more likely to have that activity).
low means lower values correspond to more of the property (e.g. hydration free
energy in kcal/mol, where more negative = more solvated). Leave empty (not
the word "none") for sampling models and for representation models with abstract
latent dimensions (e.g. neural embeddings like UniMol) where individual dimensions
have no interpretable direction. For fingerprint-based representations (e.g. Morgan
counts), use high since a higher value means more of that structural feature is
present.Examples from real Ersilia models:
eos3b5e — annotation, single output:
name,type,direction,description
mol_weight,float,high,The calculated molecular weight of the molecule in g/mol
eos7ike — annotation, multi-output:
name,type,direction,description
rb,integer,high,Low flexibility (rotatable bonds lower or equal than 5)
glob,integer,high,Low globularity (lower or equal than 0.25)
primary_amine,integer,high,Determines if a molecule has a primary amine
eos5axz — representation (first 2 of 2048 dims shown):
name,type,direction,description
dim_0000,integer,high,Morgan count fingerprint dimension 0 with radius 3 and 2048 bits
dim_0001,integer,high,Morgan count fingerprint dimension 1 with radius 3 and 2048 bits
eos2hzy — sampling (first 2 of 100 shown):
name,type,direction,description
smiles_00,string,,Compound index 0 queried with the PubChem API
smiles_01,string,,Compound index 1 queried with the PubChem API
eos6ost — sampling (first 2 of 1000 shown):
name,type,direction,description
smi_000,string,,Generated compound index 0 using pre-trained LibInvent model
smi_001,string,,Generated compound index 1 using pre-trained LibInvent model
Open <template-repo-path>/install.yml and replace the placeholder entries with the
actual dependencies.
Format:
python: "3.10" # match what the source model was tested on; minimum 3.8
commands:
- ["pip", "torch", "2.0.1"]
- ["conda", "rdkit", "2023.09.1", "conda-forge"]
- ["pip", "git+https://github.com/org/[email protected]"]
- "some-shell-command --if-needed"
Rules:
conda entries for packages best installed via conda (e.g. rdkit, cudatoolkit).pip entries for PyPI packages.pip install -e .).See references/template-structure.md for more examples.
Create <template-repo-path>/model/framework/examples/run_input.csv with exactly
three SMILES strings. Ersilia models always take SMILES as input.
Fetch 3 random SMILES from the Ersilia maintained inputs file:
python - <<'EOF'
import urllib.request, csv, random
url = "https://raw.githubusercontent.com/ersilia-os/ersilia-model-hub-maintained-inputs/main/inputs/example.csv"
with urllib.request.urlopen(url) as f:
rows = list(csv.DictReader(line.decode() for line in f))
sample = random.sample(rows, 3)
print("smiles")
for r in sample:
print(r["input"])
EOF
Write the output to run_input.csv.
To produce run_output.csv, actually run the model:
conda create -n eos-test python=<version-from-install.yml> -y
conda activate eos-test
install.yml in the order they appear.run.sh (NOT by calling main.py directly) from the template repo root:
bash model/framework/run.sh model/framework \
model/framework/examples/run_input.csv \
/tmp/run_output.csv
<template-repo-path>/model/framework/examples/run_output.csv.If the model fails, debug the environment (missing package, wrong path, CUDA issue) before writing the file. DO NOT FABRICATE OUTPUT VALUES.
Before declaring the work done, verify:
model/checkpoints/ contains all required files; large files tracked by git-lfs.gitattributes updated if git-lfs tracking was neededmodel/framework/code/main.py runs end-to-end without errorsmodel/framework/columns/run_columns.csv has correct headers and follows naming rulesinstall.yml has all dependencies pinned to exact versionsmodel/framework/examples/run_input.csv has 3 SMILES rowsmodel/framework/examples/run_output.csv was produced by actually running the modelmetadata.yml is consistent with the files produced: model type (Task field)
matches the outputs and output column names match what is declaredtesting
Produce the weekly Ersilia literature digest covering AI/ML for drug discovery, antibiotic and antimicrobial discovery, NTDs and AMR, and open science for global health — through an explicit LMIC and decolonisation lens. Use this skill whenever the user asks to prepare, run, or refresh the literature digest. Triggers include: "weekly literature digest", "literature digest for Ersilia", "/literature-digest", "lit digest this week", "what did we miss last week", "digest the literature". Always use this skill for digest requests even if the ask seems simple.
testing
A minimal test skill to verify that the ersilia-skills repository and local setup (symlinks, git hook) are working correctly. Use this skill to confirm that skill loading, slash commands, and the setup.sh workflow are functioning as expected. Trigger on phrases like "run test skill", "check skill setup", or "verify ersilia skills".
development
How to create Python plots using the stylia package — Ersilia's matplotlib wrapper for publication-ready figures. ALWAYS use this skill when the user says anything like "make a plot", "plot this", "plot the results", "visualize", "prepare a plotting function", "show me a chart", "can you plot", "add a figure", or any similar phrasing during a coding session. This includes scatter plots, line plots, bar charts, heatmaps, histograms, ROC curves, and any other chart type. Also trigger on requests to visualize data, compare values, show distributions, or create any kind of figure — even if the user does not mention stylia or matplotlib explicitly. Never generate matplotlib figures without stylia — always use stylia.create_figure() instead of plt.figure() or plt.subplots().
documentation
Create LinkedIn post drafts and end-of-month newsletter content for Ersilia Open Source Initiative. Use this skill whenever the user asks to plan LinkedIn posts, draft a monthly content schedule, write a weekly post, or create the monthly newsletter digest. Triggers include: "start of month", "end of month", "write a LinkedIn post", "prepare this month's posts", "draft the newsletter", "monthly update", "weekly post", or any request to create content for Ersilia's LinkedIn or newsletter. Also triggers when the user uploads a content calendar (PDF or text) and asks for posts for a given month. Always use this skill for any Ersilia content creation request, even if the ask seems simple.