chemoinformatics/pharmacophore-modeling/SKILL.md
Builds and applies 3D pharmacophore models using RDKit Pharm3D, the apo2ph4 receptor-based workflow (Heider et al 2022/2023 J Chem Inf Model 63:147-158), Pharmer / Pharmit (search), and PharmacoForge (diffusion-based generation, Flynn et al 2025 Front Bioinform), covering ligand-based pharmacophore (from active set alignment) and receptor-based pharmacophore (from binding pocket geometry). Explicit handling of feature types, geometric tolerances, partial matching, and pharmacophore-based virtual screening. Use when identifying scaffold-hopping candidates, building shape-and-feature search queries, or transferring SAR across chemotypes.
npx skillsauth add GPTomics/bioSkills bio-pharmacophore-modelingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference examples tested with: RDKit 2024.09+, pharmer / pharmit (web service), PharmIT 1.1+, plip 2.4+ (interaction analysis).
Before using code patterns, verify installed versions match. If versions differ:
pip show rdkit then help(rdkit.Chem.Pharm3D) to check signaturesIf code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
Build 3D pharmacophore queries that capture the essential interaction features of a ligand-target binding event. A pharmacophore is the spatial arrangement of pharmacophore features (donor, acceptor, hydrophobe, aromatic, charged) sufficient for activity, abstracted from any specific chemotype. Used for scaffold-hopping (find compounds with different scaffold but matching pharmacophore), virtual screening (faster than docking), and cross-target SAR transfer. Modern best practice: derive pharmacophore from co-crystal structure if available (receptor-based; apo2ph4 workflow of Heider et al 2022/2023 J Chem Inf Model 63:147-158) or align actives if no crystal (ligand-based). Diffusion-based generation (PharmacoForge, Flynn et al 2025 Front Bioinform) lets pharmacophore drive de novo design.
For 2D scaffold-based searches, see chemoinformatics/scaffold-analysis. For 3D shape similarity, see chemoinformatics/shape-similarity. For protein-ligand interaction analysis, see chemoinformatics/virtual-screening.
| Feature | RDKit code | Definition | Geometric tolerance | |---------|------------|------------|----------------------| | H-bond donor | D | -OH, -NH | 1.0-1.5 Å | | H-bond acceptor | A | sp2 O / N (lone pair) | 1.0-1.5 Å | | Hydrophobe | H | sp3 C / aromatic ring centroid | 1.5-2.0 Å | | Aromatic ring | R | Aromatic ring centroid + normal | 1.0-1.5 Å | | Positive ionizable | P | -NH3+, -NR3+ | 1.0-1.5 Å | | Negative ionizable | N | -COO-, -SO3- | 1.0-1.5 Å | | Halogen | X | Cl, Br, I (halogen bond donor) | 1.0-1.5 Å | | Metal coordination | M | sp/sp2 N/O near metal | 0.5-1.0 Å |
Tolerances are pharmacophore-feature distance windows in the search. Tighter tolerances = fewer hits but more specific.
| Method | Origin | Use case | Fails when |
|--------|--------|----------|------------|
| Ligand-based (LBP) | Catalyst, MOE, RDKit Pharm3D | Multiple actives, no crystal | <3 actives; flexible actives |
| Receptor-based (RBP) | apo2ph4, LigandScout | Co-crystal available | Apo structure (use AlphaFold3 or Boltz) |
| Common pharmacophore | Pharm3D EmbedPharmacophore | Consensus from active set | Diverse actives confound alignment |
| Diffusion-based (PharmacoForge) | Flynn et al 2025 Front Bioinform | De novo generation with pharmacophore prior | Pretrained model required |
| Active learning pharmacophore | Catalyst variant | Iterative refinement | Custom; not standard |
| Scenario | Method | Tools |
|----------|--------|-------|
| Co-crystal structure available | Receptor-based | apo2ph4 + Pharmer/Pharmit |
| Multiple active compounds, no crystal | Ligand-based common pharmacophore | RDKit Pharm3D EmbedPharmacophore |
| Single active compound | Single-conformer pharmacophore | RDKit Pharm3D from bioactive conformer |
| Scaffold hopping prospective | Receptor-based + shape filter | apo2ph4 + ROCS |
| Cross-target SAR transfer | Common pharmacophore across targets | Manual + LigandScout |
| De novo design with pharmacophore | PharmacoForge | Diffusion-based generation |
| Library pre-filtering | Pharmacophore screen | Pharmit search |
Goal: Extract a common pharmacophore from a set of bioactive compounds.
Approach: Align actives by maximum-common-substructure or shape; extract conserved features; derive a pharmacophore signature.
from rdkit import Chem
from rdkit.Chem import AllChem, ChemicalFeatures
from rdkit.Chem.Pharm3D import Pharmacophore
from rdkit.RDPaths import RDDataDir
import os
fdef_file = os.path.join(RDDataDir, 'BaseFeatures.fdef')
factory = ChemicalFeatures.BuildFeatureFactory(fdef_file)
active_smiles = ['CC(C)c1ccc(C(=O)NCc2ccccn2)cc1', 'CCC(C)c1ccc(C(=O)NCc2ccccn2)cc1']
active_mols = [Chem.AddHs(Chem.MolFromSmiles(s)) for s in active_smiles]
for m in active_mols:
AllChem.EmbedMolecule(m, AllChem.ETKDGv3())
AllChem.MMFFOptimizeMolecule(m)
# Extract pharmacophore features (family/type/atom-ids/3D-pos) per molecule
feature_lists = [factory.GetFeaturesForMol(m) for m in active_mols]
# Build a target pharmacophore from one active's features; the bounds matrix
# encodes per-feature-pair distance ranges across all conformer/active variability.
# Below uses 2 features (Aromatic + Donor) with 2.5-3.5 A and 5.0-6.0 A bands.
feature_types = ['Aromatic', 'Donor']
# bounds is symmetric 2x2 distance-range matrix; off-diagonal = (lo, hi)
import numpy as np
bounds_matrix = np.array([[0.0, 5.0], [3.5, 0.0]]) # upper / lower bounds
pharmacophore = Pharmacophore.Pharmacophore(feature_types)
pharmacophore.setLowerBound(0, 1, 3.5)
pharmacophore.setUpperBound(0, 1, 5.0)
# To search a target molecule for matches, use Pharm3D.EmbedLib.EmbedPharmacophore
# (see chemoinformatics/conformer-generation for 3D embedding fundamentals)
BaseFeatures.fdef (RDKit-shipped) defines feature SMARTS. For drug-like pharmacophores, this is the standard starting point.
Goal: Derive a pharmacophore from a protein binding-pocket structure without requiring a bound ligand.
Approach: Identify hot-spots (donor / acceptor / hydrophobe regions) from protein geometry; assemble into a pharmacophore. The apo2ph4 workflow of Heider J, Kilian J, Garifulina A, Hering S, Langer T, Seidel T (2022/2023 J Chem Inf Model 63:147-158) describes the conceptual pipeline; the example below is illustrative — verify the exact CLI invocation against the published code/release before running.
# Conceptual apo2ph4-style workflow; flags shown are illustrative.
apo2ph4 -pdb receptor.pdb -site_residues 'A:100,A:101,A:104,A:108' \
-output pharmacophore.ph4
apo2ph4 outputs a .ph4-style pharmacophore compatible with Phase, MOE, and Pharmer.
When a co-crystal ligand is available, derive pharmacophore directly from the ligand binding pose: each ligand feature in contact with a complementary protein residue is part of the pharmacophore.
from plip.basic import config
from plip.structure.preparation import PDBComplex
mol_complex = PDBComplex()
mol_complex.load_pdb('complex.pdb')
mol_complex.analyze()
for site in mol_complex.interaction_sets.values():
for interaction in site.all_itypes:
# H-bond donor / acceptor / pi-stacking / hydrophobic / salt-bridge
feature_type = interaction.type
feature_atom_coords = interaction.ligatom.coords
PLIP outputs interaction types per ligand atom; each interaction maps to a pharmacophore feature.
For library screening, Pharmer/Pharmit are the standard tools.
# Pharmer command line (offline alternative)
pharmer search -q pharmacophore.ph4 -dbdir zinc_db -out hits.sdf
Or use Pharmit web service (https://pharmit.csb.pitt.edu) for browser-based or REST search.
import requests
url = 'https://pharmit.csb.pitt.edu/...' # specific endpoint
response = requests.post(url, json={'pharmacophore': pharmacophore_dict})
hits = response.json()
Pharmacophore screen is orders of magnitude faster than docking; typical 100M-compound search in minutes.
Evaluate a pharmacophore by:
def pharmacophore_enrichment(query_pharmacophore, actives, inactives):
n_active_match = sum(matches_pharmacophore(a, query_pharmacophore) for a in actives)
n_inactive_match = sum(matches_pharmacophore(d, query_pharmacophore) for d in inactives)
enrichment = (n_active_match / len(actives)) / (n_inactive_match / len(inactives))
return enrichment
A good pharmacophore: enrichment >= 5x relative to random.
PharmacoForge (Flynn, Shah, Dunn et al 2025 Front Bioinform) generates molecules conditioned on a pharmacophore query using a diffusion model:
# Pseudo-code; depends on PharmacoForge installation
# from pharmacoforge import PharmacophoreDiffusionGenerator
# gen = PharmacophoreDiffusionGenerator()
# generated = gen.generate(pharmacophore_query, n_molecules=100)
Trade-off: PharmacoForge produces de novo molecules that satisfy pharmacophore; lower drug-likeness than REINVENT but better novelty.
| Method | Captures | Best for | |--------|----------|----------| | ECFP4 Tanimoto | Local atom environments | Lead optimization (same series) | | FCFP4 Tanimoto | Pharmacophore-equivalent atoms | Loose similarity in series | | Shape similarity (ROCS) | 3D shape volume | Scaffold hopping by shape | | Pharmacophore | Discrete features in space | Scaffold hopping with feature specificity | | Combined (Tanimoto + shape) | Multi-objective | Production VS |
Pharmacophore is more interpretable than shape: a hit explains why it matched (donor at position X, hydrophobe at position Y).
Trigger: Active set spans multiple scaffolds with different bound conformations.
Mechanism: No common pharmacophore exists; algorithm forces non-consensus features.
Symptom: Pharmacophore matches no actives in retrospective.
Fix: Cluster actives by scaffold first; derive per-cluster pharmacophore.
Trigger: Protein in apo form (no bound ligand).
Mechanism: Side-chain rotamers differ between apo and holo; "binding site" geometry is wrong.
Symptom: Pharmacophore inferred from apo doesn't match holo experimental data.
Fix: Use AlphaFold3 / Boltz-1 to predict holo conformation; derive pharmacophore from predicted holo.
Trigger: Active aligned to its first generated conformer, not bioactive conformer.
Mechanism: Crystal structure not available; generated conformer may not be the bound one.
Symptom: Pharmacophore inconsistent across runs (different starting conformer chosen).
Fix: Use conformer ensemble; align all to common scaffold; choose conformer most consistent with other actives.
Trigger: Default geometric tolerance < 0.5 Å.
Mechanism: Real bioactive conformers have flexibility; rigid pharmacophore filters most molecules out.
Symptom: Search returns zero hits.
Fix: Use tolerance 1.0-1.5 Å for drug-like; up to 2 Å for flexible peptide-like.
Trigger: Bioisostere replacement (e.g., -COOH replaced by tetrazole).
Mechanism: Tetrazole functions as acid bioisostere but RDKit features may not classify identically.
Symptom: Known bioisosteric active not found.
Fix: Use ChemAxon-style bioisosteric feature equivalence; or pharmacophore feature class expansion (acid generic vs -COOH specific).
Trigger: Bridging water between ligand donor and protein acceptor.
Mechanism: PLIP doesn't include explicit water.
Symptom: Pharmacophore missing critical H-bond feature.
Fix: Add water-mediated interaction manually based on crystal water positions; or use PoseView for full interaction view.
| Aspect | Ligand-based | Receptor-based | |--------|--------------|----------------| | Data needed | Multiple actives | Co-crystal or predicted holo | | Bias | Toward known chemotype | None | | Hit set | Similar to known actives | More diverse | | Discovery potential | Limited | Higher | | Pharmacophore confidence | Higher (validated against multiple actives) | Lower (single ligand) |
For prospective scaffold-hopping, receptor-based is preferred. For ranking analogs, ligand-based is sufficient.
| Symptom | Cause | Fix |
|---------|-------|-----|
| Pharm3D.EmbedPharmacophore fails | Bounds matrix infeasible | Loosen tolerances; increase max_attempts |
| Pharmacophore matches everything | Too few features | Add features; tighten tolerances |
| Pharmacophore matches nothing | Too many features or tight bounds | Reduce feature count; loosen tolerances |
| BaseFeatures.fdef not found | RDKit installation issue | Check from rdkit.RDPaths import RDDataDir |
| Pharmacophore-conformer mismatch | Wrong conformer used | Use bioactive conformer from crystal |
| Pharmit search timeout | Library too large | Pre-filter by 2D fingerprint Tanimoto |
| apo2ph4 produces empty | No druggable hot-spots | Lower thresholds; use shape-only filter |
Chem.Pharm3D framework -- documentation at rdkit.org (the unconfirmed "Stiefl 2021" citation has been removed pending verification).tools
--- name: bio-phasing-imputation-foundations description: Frames the phasing/imputation pipeline before any tool runs: phasing and imputation are one Li-Stephens copying HMM (recombination is the transition, mutation the emission, the genetic map and Ne set the rates), imputation's honest output is a dosage with a self-estimated quality (INFO/R2/DR2) not a hard genotype, and the stages are ordered and each fails silently (QC, align build and strand to the panel, phase, impute per chromosome, fil
tools
Chooses the enrichment generation before any tool runs, mapping the input shape to a method class - a pre-selected gene list plus a background to over-representation analysis (ORA, hypergeometric), a ranked statistic for all genes to gene set enrichment (GSEA), a signed signaling topology to pathway-topology (SPIA) - then making the null explicit (competitive vs self-contained, gene vs subject sampling) and running a trustworthiness checklist (testable-gene universe, FDR, redundancy collapse, leading-edge check, version reporting). Covers why every clusterProfiler GSEA is the inter-gene-correlation-uncorrected competitive null, why the background not the gene list decides ORA significance, and why no method is universally best. Use when deciding ORA vs GSEA vs topology, which gene-set DB, whether a result is trustworthy, or which null a tool computes. For ORA see go-enrichment, GSEA see gsea, databases kegg-pathways/reactome-pathways/wikipathways; the ranking comes from differential-expression/de-results.
testing
End-to-end GWAS workflow from VCF to association results. Covers PLINK QC, population structure correction, and association testing for case-control or quantitative traits. Use when running genome-wide association studies.
development
Orchestrates the full path from differential expression results to redundancy-collapsed functional enrichment: choose ORA vs GSEA, convert gene IDs per method, run enrichGO/enrichKEGG/enrichPathway/enrichWP or gseGO/gseKEGG (clusterProfiler, ReactomePA, rWikiPathways), and visualize. Routes the ORA-vs-GSEA generation fork and the null/universe/reproducibility theory to pathway-analysis/enrichment-foundations. Use when a DESeq2/edgeR/limma result must become enriched GO terms, KEGG/Reactome/WikiPathways pathways, or a GSEA leading edge; when deciding whether a ranking exists for all genes (GSEA, named decreasing vector) or only a pre-selected list (ORA plus a defensible background universe); or when assembling DE-to-pathway end to end. The DE list and ranking statistic come from differential-expression/de-results; per-method nuance lives in the pathway-analysis skills.