skills/pharma/pharma-ml-tools/SKILL.md
Pharmaceutical machine-learning workflow guide for library profiling, molecular featurization, benchmark dataset fetch, medicinal-chemistry filtering, and optional pose-generation handoff. Use when the user asks for datamol, molfeat, PyTDC, medchem, compound-library triage, dataset preparation, or chemistry-ML baselines beyond simple descriptor calculation.
npx skillsauth add drugclaw/drugclaw pharma-ml-toolsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this skill when the user asks for compound-library profiling, chemistry ML feature generation, medicinal-chemistry screening, or benchmark dataset preparation.
Typical triggers:
molfeat for downstream MLPyTDCwhich python3 || true
python3 - <<'PY'
mods = ["pandas", "numpy", "datamol", "molfeat", "medchem"]
for name in mods:
try:
__import__(name)
print(f"{name}: ok")
except Exception as exc:
print(f"{name}: missing ({exc})")
try:
import tdc
print("PyTDC: ok")
except Exception as exc:
print(f"PyTDC: missing ({exc})")
PY
If a requested module is missing, say so explicitly. Do not claim the screen, featurization, or dataset pull completed.
templates/datamol_library_profile.pytemplates/molfeat_featurize.pytemplates/pytdc_dataset_fetch.pytemplates/medchem_screen.pydatamol_library_profile.py before building models so duplicates, invalid structures, and scaffold concentration are visible.medchem_screen.py before large docking or QSAR jobs to flag problematic chemotypes.molfeat_featurize.py when the user needs model-ready features rather than only descriptor summaries.pytdc_dataset_fetch.py when the user needs reproducible public benchmark datasets rather than ad hoc CSV collection../pharma_ml/.python3 templates/datamol_library_profile.py \
--input libraries/kinase_hits.csv \
--smiles-column smiles \
--id-column compound_id \
--output pharma_ml/kinase_hits_profile.csv \
--summary pharma_ml/kinase_hits_profile.json
Use this first for:
python3 templates/molfeat_featurize.py \
--input libraries/kinase_hits.csv \
--smiles-column smiles \
--id-column compound_id \
--featurizer ecfp \
--output pharma_ml/kinase_hits_ecfp.csv \
--summary pharma_ml/kinase_hits_ecfp.json
Supported baseline featurizers in the bundled template:
ecfpmaccsrdkit2dUse this for local QSAR, ranking, clustering, or embedding handoff.
python3 templates/pytdc_dataset_fetch.py \
--task adme \
--dataset Caco2_Wang \
--split-method scaffold \
--out-dir pharma_ml/caco2_wang
Good use cases:
python3 templates/medchem_screen.py \
--input libraries/kinase_hits.csv \
--smiles-column smiles \
--id-column compound_id \
--output pharma_ml/kinase_hits_medchem.csv \
--summary pharma_ml/kinase_hits_medchem.json
Use this for:
Treat these filters as prioritization heuristics, not hard truth.
If the user asks for diffusion docking or deep pose generation, acknowledge that this runtime already includes docking-tools for Vina-style workflows, but DiffDock-class workflows require a heavier environment with PyTorch Geometric, model weights, and usually GPU acceleration. Do not pretend that support is bundled unless the environment is confirmed.
Good answers should mention:
For public APIs such as PubChem, ChEMBL, openFDA, ClinicalTrials.gov, or OpenAlex, activate pharma-db-tools.
For RDKit descriptors, ADMET heuristics, DrugBank, QSAR, or structure-aware affinity, activate chem-tools.
For docking and pose-level workflows, activate docking-tools.
tools
Survival and time-to-event workflow guide for Kaplan-Meier summaries, log-rank tests, and Cox proportional hazards models with reproducible outputs. Use when the user asks for time-to-event analysis, censored data summaries, hazard ratios, or survival-group comparison for research datasets.
tools
Statistical modeling workflow guide for hypothesis tests, effect-size reporting, statsmodels regression, diagnostics, and structured result export. Use when the user asks for statistical test selection, OLS or logistic regression, coefficient tables, inference, or reproducible statistical summaries for scientific datasets.
tools
Research-method workflow guide for hypothesis framing, peer-review style critique, reproducibility planning, study-design checks, and scientific-writing structure. Use when the user asks for manuscript critique, research-gap framing, hypothesis generation, reproducibility checklists, or study-planning support that should stay on the research side rather than patient-care decisions.
tools
Scientific visualization workflow guide for publication-ready static figures with seaborn or matplotlib and interactive figures with Plotly. Use when the user asks for scientific plots, cohort or assay figures, publication graphics, dashboards, or reusable plotting scripts for research datasets.