skills/ersilia-metadata/SKILL.md
Fills in the required metadata fields for an Ersilia Model Hub model, given the original publication as a PDF and the link to the original code repository. Use this skill whenever a user wants to populate, complete, or update a metadata.yml file for an Ersilia model, mentions an Ersilia model contribution, or is working on model metadata for the Ersilia Model Hub. Trigger even if the user just says "fill in the metadata" or "help me with the metadata.yml" in any Ersilia context.
npx skillsauth add ersilia-os/claude-ersilia-skills ersilia-metadataInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Your job is to fill in specific fields of an Ersilia model's metadata.yml file using information extracted from the original publication (PDF) and the original source code repository.
The user will provide:
metadata.yml file (already partially filled from the model request)Fill only these fields — leave everything else exactly as it is:
| Field | Type | Accepted values |
|-------|------|-----------------|
| Deployment | list | Local, Online |
| Source | string | Local, Online |
| Source Type | string | External, Internal, Replicated |
| Task | string | Annotation, Representation, Sampling |
| Subtask | string | Property calculation or prediction, Activity prediction, Featurization, Projection, Similarity search, Generation |
| Output | list | Score, Value, Compound, Text |
| Output Dimension | integer | number of output values per input compound |
| Output Consistency | string | Fixed, Variable |
| Interpretation | string | free text |
| Biomedical Area | list | free text (disease areas, application areas) |
| Target Organism | list | scientific names or Any |
| Publication Type | string | Peer reviewed, Preprint, Other |
| Publication Year | integer | year |
Never modify: Identifier, Slug, Status, Title, Description, Tag, Publication, Source Code, License, Contributor, and any auto-populated fields (Incorporation Date, S3, DockerHub, Model Size, etc.)
Note the Source Code URL and Publication URL — you will need them. Confirm which fields still need filling (they may have template placeholder text like "Biomedical Area 1" or multiple values comma-separated).
Use the PDF reading tools to extract:
Use WebFetch on the repository URL (and its README, and key code files if needed) to understand:
Work through the fields systematically:
Deployment + Source
Deployment: [Local] and Source: Local for the vast majority of models — the model runs in Ersilia's infrastructure.Online only if the model posts predictions to an external third-party server/API that Ersilia does not control.Deployment is a list; Source is a single string.Source Type
External: the model was developed by third-party authors (most models incorporated from published papers).Internal: developed by the Ersilia team themselves.Replicated: Ersilia re-trained the model following the original authors' methodology.Task + Subtask Choose Task first, then the corresponding Subtask:
Annotation → model assigns a label or score to a molecule
Activity prediction if the output is biological/pharmacological activityProperty calculation or prediction if the output is a physicochemical or ADMET propertyRepresentation → model encodes a molecule into a numerical vector or projection
Featurization if it produces a fixed-length embedding/descriptor vectorProjection if it projects molecules into 2D/3D spaceSampling → model generates or retrieves molecules
Generation if it generates new moleculesSimilarity search if it retrieves similar molecules from a databaseOutput
Score: a probability or likelihood (0–1 range, binary classification output)Value: a numerical measurement (IC50, logP, pKa, molecular weight, descriptors, embeddings…)Compound: a generated or retrieved molecule (SMILES or InChI)Text: natural language output[Score, Value].Output Dimension The number of output values produced per input compound. Only count continuous numeric outputs (scores, values) — do not count binary class labels separately. So a model with 6 endpoints each returning one probability score has Output Dimension 6, not 12.
This is often explicit in the paper ("6 endpoints", "512-dimensional vector"). If not stated directly:
n_components is a user parameter), ask the user — do not guess a default from examples in the docsOutput Consistency
Fixed: the model always returns the same output for the same input (most QSAR models, classifiers, regression models)Variable: the model is stochastic and may return different outputs on repeated runs (generative models, models with dropout at inference, sampling-based methods)Interpretation Write one short sentence describing what the output means and how to read it. Keep it under ~20 words.
Good examples:
Higher score indicates greater predicted probability of anti-malarial activity.100 features encoding molecular structure from a pretrained MACAW autoencoder.Predicted probability of AMES mutagenicity; values closer to 1 indicate higher risk.Biomedical Area
List the relevant therapeutic or research areas. Use specific disease names or application areas rather than generic terms. Examples: Malaria, Tuberculosis, ADMET, COVID-19, Solubility, Toxicity. Use Any only if the model is truly domain-agnostic (e.g., a general-purpose molecular featurizer with no disease focus).
Target Organism Use full scientific names where applicable:
Any if the model is not organism-specificPublication Type
Peer reviewed; bioRxiv/ChemRxiv/arXiv links → PreprintOther only in exceptional cases (thesis, technical report)Publication Year Extract the year of publication from the paper or publication URL.
Edit the file in place, replacing only the fields listed above. Keep the YAML formatting consistent with the rest of the file (use list syntax for list fields, plain string for string fields, integer for integer fields). Do not add quotes unless the original file uses them for that field.
Print a brief summary of the fields you filled in and the values chosen. If any field required a judgment call or the evidence was ambiguous, say so clearly and invite the user to verify.
If you've read the paper and the repository and still cannot confidently determine a value (especially Output Dimension), state clearly what you found and what is unclear, and ask the user to clarify. Do not guess or leave placeholder text.
testing
Produce the weekly Ersilia literature digest covering AI/ML for drug discovery, antibiotic and antimicrobial discovery, NTDs and AMR, and open science for global health — through an explicit LMIC and decolonisation lens. Use this skill whenever the user asks to prepare, run, or refresh the literature digest. Triggers include: "weekly literature digest", "literature digest for Ersilia", "/literature-digest", "lit digest this week", "what did we miss last week", "digest the literature". Always use this skill for digest requests even if the ask seems simple.
testing
A minimal test skill to verify that the ersilia-skills repository and local setup (symlinks, git hook) are working correctly. Use this skill to confirm that skill loading, slash commands, and the setup.sh workflow are functioning as expected. Trigger on phrases like "run test skill", "check skill setup", or "verify ersilia skills".
development
How to create Python plots using the stylia package — Ersilia's matplotlib wrapper for publication-ready figures. ALWAYS use this skill when the user says anything like "make a plot", "plot this", "plot the results", "visualize", "prepare a plotting function", "show me a chart", "can you plot", "add a figure", or any similar phrasing during a coding session. This includes scatter plots, line plots, bar charts, heatmaps, histograms, ROC curves, and any other chart type. Also trigger on requests to visualize data, compare values, show distributions, or create any kind of figure — even if the user does not mention stylia or matplotlib explicitly. Never generate matplotlib figures without stylia — always use stylia.create_figure() instead of plt.figure() or plt.subplots().
documentation
Create LinkedIn post drafts and end-of-month newsletter content for Ersilia Open Source Initiative. Use this skill whenever the user asks to plan LinkedIn posts, draft a monthly content schedule, write a weekly post, or create the monthly newsletter digest. Triggers include: "start of month", "end of month", "write a LinkedIn post", "prepare this month's posts", "draft the newsletter", "monthly update", "weekly post", or any request to create content for Ersilia's LinkedIn or newsletter. Also triggers when the user uploads a content calendar (PDF or text) and asks for posts for a given month. Always use this skill for any Ersilia content creation request, even if the ask seems simple.