plugin/skills/tooluniverse-drug-target-validation/SKILL.md
Quantitative drug-target validation pipeline. Scores druggability, selectivity, safety profile, ADMET feasibility, and structural tractability with a composite Target Validation Score (0-100) and GO/NO-GO recommendation. Use for go/no-go decisions on a target before commit-to-medchem, target prioritization across a list, and target-deselection rationale.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-drug-target-validationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Validate drug target hypotheses using multi-dimensional computational evidence before committing to wet-lab work. Produces a quantitative Target Validation Score (0-100) with priority tier classification and GO/NO-GO recommendation.
A valid drug target must pass 4 gates in order. Failing an early gate makes later gates irrelevant:
Do not proceed to Phase 3 (Chemical Matter) before completing Phase 1 (Disease Association). Gate 1 failures should prompt a NO-GO or pivot recommendation.
LOOK UP DON'T GUESS: Never assume a target is druggable based on its protein family alone, never assume expression is low in a tissue without checking GTEx or HPA, never assume no competitors without searching ClinicalTrials.gov.
RUN THE ML MODELS, DON'T SKIP THEM: When deep-learning predictors are available (ADMET-AI, ESMFold, AlphaFold, DoGSite, DynaMut2, DeepGO), run them even when database lookups or experimental data already cover the same property. The ML predictions provide an orthogonal, mechanistically-grounded estimate that's a first-class output of this skill — not a fallback. A target-validation report missing ML predictions is incomplete regardless of how much database evidence is present.
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
Apply when users ask about:
Not for (use other skills): general target biology (tooluniverse-target-research), drug compound profiling (tooluniverse-drug-research), variant interpretation (tooluniverse-variant-interpretation), disease research (tooluniverse-disease-research).
| Parameter | Required | Description | Example |
|-----------|----------|-------------|---------|
| target | Yes | Gene symbol, protein name, or UniProt ID | EGFR, P00533 |
| disease | No | Disease/indication for context | Non-small cell lung cancer |
| modality | No | Preferred therapeutic modality | small molecule, antibody, PROTAC |
Total: 0-100 points across 5 dimensions (details in SCORING_CRITERIA.md):
| Dimension | Max | Sub-dimensions | |-----------|-----|----------------| | Disease Association | 30 | Genetic (10) + Literature (10) + Pathway (10) | | Druggability | 25 | Structure (10) + Chemical matter (10) + Target class (5) | | Safety Profile | 20 | Expression (5) + Genetic validation (10) + ADRs (5) | | Clinical Precedent | 15 | Based on highest clinical stage achieved | | Validation Evidence | 10 | Functional studies (5) + Disease models (5) |
Priority Tiers: 80-100 = Tier 1 (GO) | 60-79 = Tier 2 (CONDITIONAL GO) | 40-59 = Tier 3 (CAUTION) | 0-39 = Tier 4 (NO-GO)
Evidence Grades: T1 (clinical proof) > T2 (functional studies) > T3 (associations) > T4 (predictions)
Resolve target to ALL identifiers before any analysis.
Steps:
MyGene_query_genes - Get initial IDs (Ensembl, UniProt, Entrez)ensembl_lookup_gene - Get versioned Ensembl ID (species="homo_sapiens" REQUIRED)ensembl_get_xrefs - Cross-references (HGNC, etc.)OpenTargets_get_target_id_description_by_name - Verify OT targetChEMBL_search_targets - Get ChEMBL target IDUniProt_get_function_by_accession - Function summary (returns list of strings)UniProt_get_alternative_names_by_accession - Collision detectionOutput: Table of verified identifiers (Gene Symbol, Ensembl, UniProt, Entrez, ChEMBL, HGNC) plus protein function and target class.
Quantify target-disease association from genetic, literature, and pathway evidence.
Key tools:
OpenTargets_get_diseases_phenotypes_by_target_ensembl - Disease associationsOpenTargets_target_disease_evidence - Detailed evidence (needs efoId + ensemblId)OpenTargets_get_evidence_by_datasource - Evidence by data sourcegwas_get_snps_for_gene / gwas_search_studies - GWAS evidencegnomad_get_gene_constraints - Genetic constraint (pLI, LOEUF)PubMed_search_articles - Literature (returns plain list of dicts)OpenTargets_get_publications_by_target_ensemblID - OT publications (uses entityId)Assess whether the target is amenable to therapeutic intervention.
Key tools:
OpenTargets_get_target_tractability_by_ensemblID - Tractability (SM, AB, PR, OC)OpenTargets_get_target_classes_by_ensemblID - Target classificationPharos_get_target - TDL: Tclin > Tchem > Tbio > TdarkDGIdb_get_gene_druggability - Druggability categoriesalphafold_get_prediction (param: qualifier) / alphafold_get_summaryProteinsPlus_predict_binding_sites - Pocket detectionOpenTargets_get_chemical_probes_by_target_ensemblID - Chemical probesOpenTargets_get_target_enabling_packages_by_ensemblID - TEPsTCDB_get_transporter - For SLC/ABC transporter targets: TC classification, family, PDB structures (param: uniprot_accession)TCDB_search_by_substrate - Find transporters by substrate (param: substrate_name)Identify existing chemical starting points for target validation.
Key tools:
ChEMBL_search_targets + ChEMBL_get_target_activities - Bioactivity data (note: target_chembl_id__exact with double underscore)BindingDB_get_ligands_by_uniprot - Binding data (affinity in nM)PubChem_search_assays_by_target_gene + PubChem_get_assay_active_compounds - HTS dataOpenTargets_get_associated_drugs_by_target_ensemblID - Known drugs (size REQUIRED)ChEMBL_search_mechanisms - Drug mechanismsDGIdb_get_gene_info - Drug-gene interactionsFor each lead / approved compound identified above, run all ten ADMET-AI Chemprop-GNN endpoints. This is a required deliverable of the skill, not optional:
| Endpoint | Tool |
|---|---|
| Physicochemical (MW, logP, HBA/HBD, TPSA) | ADMETAI_predict_physicochemical_properties |
| Toxicity (AMES, DILI, LD50, carcinogens, skin sensitizers, ClinTox) | ADMETAI_predict_toxicity |
| BBB penetrance | ADMETAI_predict_BBB_penetrance |
| CYP interactions (1A2, 2C9, 2C19, 2D6, 3A4) | ADMETAI_predict_CYP_interactions |
| Bioavailability (HIA, PAMPA, Caco-2, F20/F30) | ADMETAI_predict_bioavailability |
| Clearance & distribution (hepatocyte, microsome, VDss, PPB) | ADMETAI_predict_clearance_distribution |
| Nuclear receptor activity (NR-AR, NR-AhR, NR-Aromatase, NR-ER, NR-PPAR-γ) | ADMETAI_predict_nuclear_receptor_activity |
| Stress response (SR-ARE, SR-ATAD5, SR-HSE, SR-MMP, SR-p53) | ADMETAI_predict_stress_response |
| Solubility, lipophilicity, hydration | ADMETAI_predict_solubility_lipophilicity_hydration |
| Metabolism (CYP-mediated) | ADMETAI_predict_CYP_interactions |
Required output — ADMET head-to-head table: when two or more candidate drugs exist (approved or late-stage), produce a side-by-side comparison table with every endpoint in the same row and a "Winner" column flagging which drug is safer. This table is the primary visual of the report and must not be abbreviated or summarized into prose.
ADMET-AI fallback (IMPORTANT): If MCP calls to ADMETAI_predict_* fail, return empty, or timeout, run them via Bash + Python SDK instead:
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()
for endpoint in ['physicochemical_properties','toxicity','BBB_penetrance','CYP_interactions',
'bioavailability','clearance_distribution','nuclear_receptor_activity',
'stress_response','solubility_lipophilicity_hydration']:
r = tu.run_one_function({'name': f'ADMETAI_predict_{endpoint}',
'arguments': {'smiles_list': [SMILES_DRUG_A, SMILES_DRUG_B]}})
print(f'{endpoint}: {r}')
This SDK path bypasses the CLI subprocess and avoids segfault issues with torch. Always try MCP first; use this fallback if MCP returns no data.
Assess clinical validation from approved drugs and clinical trials.
Key tools:
FDA_get_mechanism_of_action_by_drug_name / FDA_get_indications_by_drug_namedrugbank_get_targets_by_drug_name_or_drugbank_id (ALL params required: query, case_sensitive, exact_match, limit)search_clinical_trials (query_term REQUIRED)OpenTargets_get_drug_warnings_by_chemblId / OpenTargets_get_drug_adverse_events_by_chemblIdIdentify safety risks from expression, genetics, and known adverse events.
Key tools:
OpenTargets_get_target_safety_profile_by_ensemblID - Safety liabilitiesGTEx_get_median_gene_expression - Tissue expression (operation="median" REQUIRED)HPA_search_genes_by_query / HPA_get_comprehensive_gene_details_by_ensembl_idOpenTargets_get_biological_mouse_models_by_ensemblID - KO phenotypesFDA_get_adverse_reactions_by_drug_name / FDA_get_boxed_warning_info_by_drug_nameOpenTargets_get_target_homologues_by_ensemblID - Paralog risksCritical tissues to check: heart, liver, kidney, brain, bone marrow.
Understand the target's role in biological networks and disease pathways.
Key tools:
Reactome_map_uniprot_to_pathways (param: id, NOT uniprot_id)STRING_get_protein_interactions (param: protein_ids as array, species=9606)intact_get_interactions - Experimental PPIOpenTargets_get_target_gene_ontology_by_ensemblID - GO termsSTRING_functional_enrichment - Enrichment analysisAssess: pathway redundancy, compensation risk, feedback loops.
Assess existing functional validation data.
Key tools:
DepMap_get_gene_dependencies - Essentiality (score < -0.5 = essential)PubMed_search_articles - Search for CRISPR/siRNA/knockout studiesCTD_get_gene_diseases - Gene-disease associationsLeverage structural biology for druggability and mechanism understanding. ALWAYS run both the deep-learning predictors (ESMFold, DoGSite) AND retrieve experimental structures, even when high-resolution PDB entries already exist. The ML models give an independent pLDDT/druggability score that is a required output of this phase.
Required tool calls (every run):
ESMFold_predict_structure — Meta ESM-2 language-model structure prediction from the UniProt sequence. Report: model pLDDT, worst-residue confidence, RMSD vs. reference PDB if available.alphafold_get_prediction / alphafold_get_summary — DeepMind AlphaFold model + per-residue pLDDT.ProteinsPlus_predict_binding_sites — DoGSite deep-learning pocket scoring. Report: top 3 pockets with volume, druggability score, residue composition.Supporting tools:
UniProt_get_entry_by_accession - Extract PDB cross-referencesget_protein_metadata_by_pdb_id / pdbe_get_entry_summary / pdbe_get_entry_qualityInterPro_get_protein_domains / InterPro_get_domain_details - Domain architectureComprehensive collision-aware literature analysis.
Steps:
"{gene_symbol}"[Title] in PubMed; if >20% off-topic, add filters (AND protein OR gene OR receptor)review[pt] filter in PubMedopenalex_search_works for impact dataEuropePMC_search_articlesSynthesize all phases into actionable output:
| Model | Architecture | Contributed | |---|---|---| | AlphaFold | DeepMind iterative SE(3)-equivariant Transformer | Full-length 3D model; per-residue pLDDT 91.5 | | ESMFold | Meta ESM-2 protein language model | Sequence→structure baseline; confidence vs. AlphaFold | | DoGSite3 | CNN pocket scorer (ProteinsPlus) | Top-3 druggable pockets with volume and drug-score | | ADMET-AI | Chemprop GNN ensemble (TDC) | 10 endpoints for sotorasib / adagrasib (table above) | | DynaMut2 | Graph-based mutation stability predictor | ΔΔG for G12C vs. WT | | DeepGO | Hierarchical GO-term classifier | Molecular-function predictions |
Only list models actually called during the run. This section makes the ML content first-class for a scientific or investor audience.
Create file: [TARGET]_[DISEASE]_validation_report.md
Use the full template from REPORT_TEMPLATE.md. Key sections:
Complete the Completeness Checklist (in REPORT_TEMPLATE.md) before finalizing to verify all phases were covered, all scores justified, and negative results documented.
tools
Post-market safety surveillance and recall/adverse-event RETRIEVAL across the full spectrum of FDA-regulated products that are NOT covered by the drug-AE signal skills: medical devices, food / dietary supplements / cosmetics, veterinary drugs, and drug supply (shortages). Orchestrates openFDA endpoints (MAUDE device adverse events + device recalls + 510(k), CAERS food/supplement/ cosmetic adverse events, veterinary adverse events, drug shortages, and cross-product enforcement/recall reports). USE WHEN the user asks: "are there adverse events for [device / pacemaker / infusion pump / insulin pump]", "device recalls for [firm/product]", "supplement / vitamin / cosmetic adverse reactions", "is [drug] in shortage", "what injectables are on shortage", "veterinary / animal adverse events for [drug] in [dog/cat/horse]", "food recall for listeria", "MAUDE report for [device]", "CAERS reactions for [brand]". DO NOT USE for drug adverse-event SIGNAL detection or disproportionality (PRR / ROR / IC) or drug-AE association scoring — that is `tooluniverse-pharmacovigilance` / `tooluniverse-adverse-event-detection`. This skill is multi-product surveillance and retrieval, not drug-AE statistical signal mining.
tools
--- name: tooluniverse-phewas description: Cross-ancestry / cross-biobank phenome-wide association (PheWAS) and replication. Given ONE variant (rsID) or ONE gene, look up every phenotype it associates with across European/UK (UKB-TOPMed), Finnish (FinnGen), Japanese (BioBank Japan), and Taiwanese (TPMI) biobanks, plus exome-wide gene-burden PheWAS (Genebass), then judge whether an association replicates across ancestries or is population-specific. Use whenever the user asks "what else is this va
tools
Dereplicate a putative natural product and assign its chemical taxonomy. Use to answer "is [compound] a known natural product", "what microbe/organism produces [compound]", "what chemical class is [compound]", "dereplicate this metabolite (by formula/exact mass/InChIKey/SMILES)", or "classify this molecule into ChemOnt". Searches NPAtlas for known microbial natural products (producing organism + literature reference), assigns the ChemOnt kingdom→superclass→class→subclass hierarchy via ClassyFire, resolves systematic IUPAC names to structure via OPSIN, and cross-references identity in PubChem. NOT for general drug/compound identity or ADMET (use tooluniverse-chemical-compound-retrieval / tooluniverse-small-molecule-discovery) and NOT for metabolomics pathway/enrichment analysis (use tooluniverse-metabolomics skills).
tools
Genome-ASSEMBLY discovery, QC, and replicon mapping for any organism (bacteria, archaea, fungi, and beyond) using NCBI Datasets. Resolves an organism name or taxid to assemblies, picks the reference/representative or best-quality assembly, pulls assembly QC metrics (total length, contig/scaffold N50, contig count, GC%, assembly level, RefSeq category), enumerates chromosomes and plasmids via per-replicon sequence reports, and compares candidate assemblies on quality. Use for "what genomes are available for [organism]", "assembly stats / N50 / GC content for [GCF_/GCA_ accession]", "how many plasmids does [strain] have", "compare assemblies for [species]", "find the reference genome for [taxon]", "is this assembly Complete Genome or just contigs". NOT for gene-level orthology/synteny (use tooluniverse-comparative-genomics), plant gene structure (use tooluniverse-plant-genomics), de novo assembly from raw reads (no tool exists), or taxonomy-only name/lineage lookups.