skills/drug-target-validation/SKILL.md
ToolUniverse workflow — Drug Target Validation
npx skillsauth add lamm-mit/scienceclaw drug-target-validationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Validate drug target hypotheses using multi-dimensional computational evidence before committing to wet-lab work. Produces a quantitative Target Validation Score (0-100) with priority tier classification and GO/NO-GO recommendation.
KEY PRINCIPLES:
Apply when users:
NOT for (use other skills instead):
tooluniverse-target-researchtooluniverse-drug-researchtooluniverse-variant-interpretationtooluniverse-disease-research| Parameter | Required | Description | Example |
|-----------|----------|-------------|---------|
| target | Yes | Gene symbol, protein name, or UniProt ID | EGFR, P00533, Epidermal growth factor receptor |
| disease | No | Disease/indication for context | Non-small cell lung cancer, Pancreatic cancer |
| modality | No | Preferred therapeutic modality | small molecule, antibody, protein therapeutic, PROTAC |
Disease Association (0-30 points):
Druggability (0-25 points):
Safety Profile (0-20 points):
Clinical Precedent (0-15 points):
Validation Evidence (0-10 points):
| Score | Tier | Recommendation | |-------|------|----------------| | 80-100 | Tier 1 | Highly validated - proceed with confidence | | 60-79 | Tier 2 | Good target - needs focused validation | | 40-59 | Tier 3 | Moderate risk - significant validation needed | | 0-39 | Tier 4 | High risk - consider alternatives |
| Tier | Symbol | Criteria | Examples | |------|--------|----------|----------| | T1 | [T1] | Direct mechanistic, human clinical proof | FDA-approved drug, crystal structure with mechanism, patient mutation | | T2 | [T2] | Functional studies, model organism | siRNA phenotype, mouse KO, biochemical assay, CRISPR screen | | T3 | [T3] | Association, screen hits, computational | GWAS hit, DepMap essentiality, expression correlation | | T4 | [T4] | Mention, review, text-mined, predicted | Review article, database annotation, AlphaFold prediction |
Objective: Resolve target to ALL needed identifiers before any analysis.
# Step 1: Determine input type and get initial identifiers
# If gene symbol (e.g., "EGFR"):
mygene = tu.tools.MyGene_query_genes(query="EGFR", species="human", fields="symbol,name,ensembl.gene,uniprot.Swiss-Prot,entrezgene")
# Extract: ensembl_id, uniprot_id, entrez_id, symbol, name
# If UniProt ID (e.g., "P00533"):
uniprot = tu.tools.UniProt_get_entry_by_accession(accession="P00533")
# Extract: gene names, Ensembl xrefs, function
# Step 2: Resolve Ensembl ID and get versioned ID for GTEx
ensembl = tu.tools.ensembl_lookup_gene(gene_id=ensembl_id, species="homo_sapiens")
# CRITICAL: species parameter is REQUIRED
# CRITICAL: Response is wrapped in {status, data, url, content_type} - access via ensembl['data']
ensembl_data = ensembl.get('data', ensembl) if isinstance(ensembl, dict) else ensembl
# Extract: version for versioned_id (e.g., "ENSG00000146648.18")
# Step 3: Get Ensembl cross-references
xrefs = tu.tools.ensembl_get_xrefs(id=ensembl_id)
# Extract: HGNC, UniProt, EntrezGene mappings
# Step 4: Get OpenTargets target info
ot_target = tu.tools.OpenTargets_get_target_id_description_by_name(targetName="EGFR")
# Verify ensemblId matches
# Step 5: Get ChEMBL target ID
chembl_targets = tu.tools.ChEMBL_search_targets(pref_name__contains="EGFR", organism="Homo sapiens", limit=5)
# Extract: target_chembl_id for later use
# Step 6: Get UniProt function summary
function_info = tu.tools.UniProt_get_function_by_accession(accession=uniprot_id)
# Returns list of strings (NOT dict)
# Step 7: Get alternative names for collision detection
alt_names = tu.tools.UniProt_get_alternative_names_by_accession(accession=uniprot_id)
## 1. Target Identity
| Database | Identifier | Verified |
|----------|-----------|----------|
| Gene Symbol | EGFR | Yes |
| Full Name | Epidermal growth factor receptor | Yes |
| Ensembl | ENSG00000146648 | Yes |
| Ensembl (versioned) | ENSG00000146648.18 | Yes |
| UniProt | P00533 | Yes |
| Entrez Gene | 1956 | Yes |
| ChEMBL | CHEMBL203 | Yes |
| HGNC | HGNC:3236 | Yes |
**Protein Function**: [from UniProt_get_function_by_accession]
**Subcellular Location**: [from UniProt_get_subcellular_location_by_accession]
**Target Class**: [from OpenTargets_get_target_classes_by_ensemblID]
| Tool | WRONG Parameter | CORRECT Parameter |
|------|-----------------|-------------------|
| ensembl_lookup_gene | id | gene_id (+ species="homo_sapiens" REQUIRED) |
| Reactome_map_uniprot_to_pathways | uniprot_id | id |
| ensembl_get_xrefs | gene_id | id |
| GTEx_get_median_gene_expression | gencode_id only | gencode_id + operation="median" |
| OpenTargets_* | ensemblID (uppercase) | ensemblId (camelCase) |
| OpenTargets_get_publications_* | ensemblId | entityId |
| OpenTargets_get_associated_drugs_by_target_ensemblID | ensemblId only | ensemblId + size (REQUIRED) |
| MyGene_query_genes | q | query |
| PubMed_search_articles | returns {articles: [...]} | returns plain list of dicts |
| UniProt_get_function_by_accession | returns dict | returns list of strings |
| HPA_get_rna_expression_by_source | ensembl_id | gene_name + source_type + source_name (ALL required) |
| alphafold_get_prediction | uniprot_accession | qualifier |
| drugbank_get_safety_* | simple params | query, case_sensitive, exact_match, limit (ALL required) |
Objective: Quantify the strength of target-disease association from genetic, literature, and pathway evidence.
# Get ALL disease associations for target
diseases = tu.tools.OpenTargets_get_diseases_phenotypes_by_target_ensembl(ensemblId=ensembl_id)
# If specific disease provided, get detailed evidence
if disease_name:
disease_info = tu.tools.OpenTargets_get_disease_id_description_by_name(diseaseName=disease_name)
efo_id = disease_info.get('id') # e.g., "EFO_0003060"
evidence = tu.tools.OpenTargets_target_disease_evidence(
efoId=efo_id, ensemblId=ensembl_id
)
# Get evidence by data source for detailed breakdown
datasource_evidence = tu.tools.OpenTargets_get_evidence_by_datasource(
efoId=efo_id, ensemblId=ensembl_id,
datasourceIds=["ot_genetics_portal", "eva", "gene2phenotype", "genomics_england", "uniprot_literature"],
size=100
)
# GWAS associations for target gene
gwas_snps = tu.tools.gwas_get_snps_for_gene(mapped_gene=gene_symbol, size=50)
# If specific disease, search for trait-specific associations
if disease_name:
gwas_studies = tu.tools.gwas_search_studies(query=disease_name, size=20)
# Genetic constraint - intolerance to loss of function
constraints = tu.tools.gnomad_get_gene_constraints(gene_symbol=gene_symbol)
# Extract: pLI, LOEUF, missense_z, pRec
# High pLI (>0.9) = highly intolerant to LoF = likely essential
# PubMed for target-disease association
articles = tu.tools.PubMed_search_articles(
query=f'"{gene_symbol}" AND "{disease_name}" AND (target OR therapeutic OR inhibitor)',
limit=50
)
# PubMed_search_articles returns a plain list of dicts
# OpenTargets publications
pubs = tu.tools.OpenTargets_get_publications_by_target_ensemblID(entityId=ensembl_id)
Genetic Evidence (0-10):
- GWAS hits for specific disease: +3 per significant locus (max 6)
- Rare variant evidence (ClinVar pathogenic): +2
- Somatic mutations in disease: +2
- pLI > 0.9 (essential gene): +2
Literature Evidence (0-10):
- >100 publications on target+disease: 10
- 50-100 publications: 7
- 10-50 publications: 5
- 1-10 publications: 3
- 0 publications: 0
Pathway Evidence (0-10):
- OpenTargets overall score > 0.8: 10
- Score 0.5-0.8: 7
- Score 0.2-0.5: 4
- Score < 0.2: 1
Objective: Assess whether the target is amenable to therapeutic intervention.
# Tractability assessment across modalities
tractability = tu.tools.OpenTargets_get_target_tractability_by_ensemblID(ensemblId=ensembl_id)
# Returns: label, modality (SM, AB, PR, OC), value (boolean/score)
# Modalities: Small Molecule, Antibody, PROTAC, Other Clinical
# Target classification (kinase, GPCR, ion channel, etc.)
target_classes = tu.tools.OpenTargets_get_target_classes_by_ensemblID(ensemblId=ensembl_id)
# Pharos target development level
pharos = tu.tools.Pharos_get_target(gene=gene_symbol)
# TDL: Tclin (approved drug) > Tchem (compounds) > Tbio (biology) > Tdark (unknown)
# DGIdb druggability categories
druggability = tu.tools.DGIdb_get_gene_druggability(genes=[gene_symbol])
# PDB structures available
if uniprot_id:
uniprot_entry = tu.tools.UniProt_get_entry_by_accession(accession=uniprot_id)
# Extract PDB cross-references from entry
# AlphaFold prediction
alphafold = tu.tools.alphafold_get_prediction(qualifier=uniprot_id)
alphafold_summary = tu.tools.alphafold_get_summary(qualifier=uniprot_id)
# For top PDB structures, analyze binding pockets
# ProteinsPlus DoGSiteScorer for pocket detection
for pdb_id in top_pdb_ids[:3]:
pockets = tu.tools.ProteinsPlus_predict_binding_sites(pdb_id=pdb_id)
# Returns predicted druggable pockets with scores
# Chemical probes (validated tool compounds)
probes = tu.tools.OpenTargets_get_chemical_probes_by_target_ensemblID(ensemblId=ensembl_id)
# Target Enabling Packages (TEPs)
teps = tu.tools.OpenTargets_get_target_enabling_packages_by_ensemblID(ensemblId=ensembl_id)
Structural Tractability (0-10):
- High-res co-crystal structure with ligand: 10
- PDB structure available, pockets detected: 7
- AlphaFold only, confident pocket prediction: 5
- AlphaFold low confidence / no structure: 2
- No structural data: 0
Chemical Matter (0-10):
- Known drug-like compounds (IC50 < 100nM): 10
- Tool compounds (IC50 < 1uM): 7
- HTS hits only (IC50 > 1uM): 4
- No known ligands: 0
Target Class Bonus (0-5):
- Validated druggable family (kinase, GPCR, nuclear receptor): 5
- Enzyme, ion channel: 4
- Protein-protein interaction, transporter: 2
- Novel/unknown class: 0
Objective: Identify existing chemical starting points for target validation.
# Search for ChEMBL target
chembl_targets = tu.tools.ChEMBL_search_targets(
pref_name__contains=gene_symbol, organism="Homo sapiens", limit=10
)
# Get activities for best matching target
target_chembl_id = chembl_targets[0]['target_chembl_id']
activities = tu.tools.ChEMBL_get_target_activities(
target_chembl_id__exact=target_chembl_id, limit=100
)
# Parse: compound IDs, pChEMBL values, activity types (IC50, Ki, Kd)
# Filter: potent compounds (pChEMBL >= 6.0 = IC50 <= 1uM)
# Experimental binding data
ligands = tu.tools.BindingDB_get_ligands_by_uniprot(
uniprot=uniprot_id, affinity_cutoff=10000 # nM
)
# Returns: SMILES, affinity_type (Ki/IC50/Kd), affinity value, PMID
# HTS screening data
assays = tu.tools.PubChem_search_assays_by_target_gene(gene_symbol=gene_symbol)
# Get details for top assays
for aid in assay_ids[:5]:
summary = tu.tools.PubChem_get_assay_summary(aid=str(aid))
targets = tu.tools.PubChem_get_assay_targets(aid=str(aid))
actives = tu.tools.PubChem_get_assay_active_compounds(aid=str(aid))
# OpenTargets known drugs
drugs = tu.tools.OpenTargets_get_associated_drugs_by_target_ensemblID(
ensemblId=ensembl_id, size=25
)
# ChEMBL drug mechanisms
drug_mechanisms = tu.tools.ChEMBL_search_mechanisms(
target_chembl_id=target_chembl_id, limit=50
)
# Drug interaction databases
dgidb = tu.tools.DGIdb_get_gene_info(genes=[gene_symbol])
### 4. Known Modulators & Chemical Matter
#### 4.1 Approved Drugs
| Drug | ChEMBL ID | Mechanism | Phase | Indication | Source |
|------|-----------|-----------|-------|------------|--------|
| Erlotinib | CHEMBL553 | Inhibitor | 4 | NSCLC | [T1] OpenTargets |
| Gefitinib | CHEMBL939 | Inhibitor | 4 | NSCLC | [T1] OpenTargets |
#### 4.2 ChEMBL Bioactivity Summary
**Total Activities**: 12,456 datapoints across 2,341 assays
**Most Potent Compound**: CHEMBL413456 (IC50 = 0.3 nM) [T1]
**Chemical Series**: 8 distinct scaffolds with pChEMBL >= 7.0
**Selectivity Data**: Available for 45 compounds (kinase panel)
#### 4.3 BindingDB Ligands
**Total Ligands**: 856 with measured affinity
**Best Affinity**: 0.1 nM (Ki)
**Affinity Distribution**: <1nM: 23, 1-10nM: 89, 10-100nM: 234, 100nM-1uM: 510
#### 4.4 Chemical Probes
| Probe | Source | Potency | Selectivity | Use |
|-------|--------|---------|-------------|-----|
| SGC-1234 | SGC | IC50=5nM | >100x | In vitro |
Objective: Assess clinical validation from approved drugs and clinical trials.
# FDA label information
fda_moa = tu.tools.FDA_get_mechanism_of_action_by_drug_name(drug_name=gene_symbol)
fda_indications = tu.tools.FDA_get_indications_by_drug_name(drug_name=known_drug_name)
# DrugBank pharmacology
drugbank_targets = tu.tools.drugbank_get_targets_by_drug_name_or_drugbank_id(
query=known_drug_name, case_sensitive=False, exact_match=False, limit=10
)
# DrugBank safety info
drugbank_safety = tu.tools.drugbank_get_safety_by_drug_name_or_drugbank_id(
query=known_drug_name, case_sensitive=False, exact_match=False, limit=10
)
# Active clinical trials targeting this protein
trials = tu.tools.search_clinical_trials(
query_term=gene_symbol,
intervention=gene_symbol,
pageSize=50
)
# If specific disease context
if disease_name:
disease_trials = tu.tools.search_clinical_trials(
query_term=gene_symbol,
condition=disease_name,
pageSize=50
)
# Drug warnings and withdrawals
for drug_chembl_id in known_drug_ids:
warnings = tu.tools.OpenTargets_get_drug_warnings_by_chemblId(chemblId=drug_chembl_id)
adverse = tu.tools.OpenTargets_get_drug_adverse_events_by_chemblId(chemblId=drug_chembl_id)
Clinical Precedent (0-15):
- FDA-approved drug for SAME disease: 15
- FDA-approved drug for DIFFERENT disease: 12
- Phase 3 clinical trial: 10
- Phase 2 clinical trial: 7
- Phase 1 clinical trial: 5
- Preclinical compounds only: 3
- No clinical development: 0
Adjustment factors:
- Failed clinical program for safety: -3
- Drug withdrawal: -5
- Multiple approved drugs (validated class): +2
Objective: Identify safety risks from expression, genetics, and known adverse events.
safety = tu.tools.OpenTargets_get_target_safety_profile_by_ensemblID(ensemblId=ensembl_id)
# Returns: safety liabilities, adverse effects, experimental toxicity
# GTEx tissue expression (identifies essential organ expression)
gtex = tu.tools.GTEx_get_median_gene_expression(
operation="median", gencode_id=ensembl_versioned_id
)
# If empty, try unversioned ID
# HPA expression
# NOTE: HPA_get_rna_expression_by_source requires gene_name, source_type, source_name
hpa = tu.tools.HPA_search_genes_by_query(search_query=gene_symbol)
hpa_details = tu.tools.HPA_get_comprehensive_gene_details_by_ensembl_id(ensembl_id=ensembl_id)
# Check expression in safety-critical tissues
# Heart, liver, kidney, brain, bone marrow = high risk if target is expressed
# Mouse model phenotypes
mouse_models = tu.tools.OpenTargets_get_biological_mouse_models_by_ensemblID(ensemblId=ensembl_id)
# Genetic constraint (proxy for essentiality)
constraints = tu.tools.gnomad_get_gene_constraints(gene_symbol=gene_symbol)
# High pLI = essential gene = potential safety concern
# For known drugs targeting this protein
for drug_name in known_drug_names:
fda_adr = tu.tools.FDA_get_adverse_reactions_by_drug_name(drug_name=drug_name)
fda_warnings = tu.tools.FDA_get_warnings_and_cautions_by_drug_name(drug_name=drug_name)
fda_boxed = tu.tools.FDA_get_boxed_warning_info_by_drug_name(drug_name=drug_name)
fda_contraindications = tu.tools.FDA_get_contraindications_by_drug_name(drug_name=drug_name)
# Paralogs (close family members that might be hit)
homologs = tu.tools.OpenTargets_get_target_homologues_by_ensemblID(ensemblId=ensembl_id)
# Paralogs with high sequence identity = selectivity challenge
Tissue Expression Selectivity (0-5):
- Target restricted to disease tissue: 5
- Low expression in heart/liver/kidney/brain: 4
- Moderate expression in 1-2 critical tissues: 2
- High expression in multiple critical tissues: 0
Genetic Validation (0-10):
- Mouse KO viable, no severe phenotype: 10
- Mouse KO viable with mild phenotype: 7
- Mouse KO has concerning phenotype: 3
- Mouse KO lethal: 0
- No KO data, low pLI (<0.5): 5
- No KO data, high pLI (>0.9): 2
Known Adverse Events (0-5):
- No known safety signals: 5
- Mild, manageable ADRs: 3
- Serious ADRs reported: 1
- Black box warning or drug withdrawal: 0
Objective: Understand the target's role in biological networks and disease pathways.
# Map target to pathways
pathways = tu.tools.Reactome_map_uniprot_to_pathways(id=uniprot_id)
# Get pathway details for top pathways
for pathway in top_pathways[:5]:
detail = tu.tools.Reactome_get_pathway(id=pathway['stId'])
reactions = tu.tools.Reactome_get_pathway_reactions(id=pathway['stId'])
# STRING network
string_ppi = tu.tools.STRING_get_protein_interactions(
protein_ids=[gene_symbol], species=9606, confidence_score=0.7
)
# Higher confidence = more reliable
# IntAct interactions (experimental)
intact_ppi = tu.tools.intact_get_interactions(identifier=uniprot_id)
# OpenTargets interactions
ot_ppi = tu.tools.OpenTargets_get_target_interactions_by_ensemblID(ensemblId=ensembl_id)
# GO annotations
go_terms = tu.tools.OpenTargets_get_target_gene_ontology_by_ensemblID(ensemblId=ensembl_id)
# Direct GO query
go_annotations = tu.tools.GO_get_annotations_for_gene(gene_id=gene_symbol)
# STRING functional enrichment of interaction partners
enrichment = tu.tools.STRING_functional_enrichment(
protein_ids=[gene_symbol], species=9606
)
### 7. Pathway Context & Network Analysis
#### 7.1 Key Pathways
| Pathway | Reactome ID | Relevance to Disease | Evidence |
|---------|-------------|---------------------|----------|
| EGFR signaling | R-HSA-177929 | Driver pathway in NSCLC | [T1] |
| RAS-RAF-MEK-ERK | R-HSA-5673001 | Downstream effector | [T1] |
| PI3K-AKT signaling | R-HSA-2219528 | Resistance mechanism | [T2] |
#### 7.2 Protein-Protein Interactions
**Total Interactors**: 45 (STRING confidence > 0.7)
**Key Interactors**: GRB2, SHC1, PLCG1, PIK3CA, STAT3
#### 7.3 Pathway Redundancy Assessment
**Compensation Risk**: MODERATE
- Parallel pathways: HER2, HER3 can compensate
- Feedback loops: RAS activation bypasses EGFR
- Downstream convergence: MEK/ERK shared with other RTKs
Objective: Assess existing functional validation data.
# Gene essentiality in cancer cell lines
deps = tu.tools.DepMap_get_gene_dependencies(gene_symbol=gene_symbol)
# Negative scores = essential (cells die upon KO)
# Score < -0.5: moderately essential
# Score < -1.0: strongly essential
# Search for functional studies
validation_papers = tu.tools.PubMed_search_articles(
query=f'"{gene_symbol}" AND (CRISPR OR siRNA OR knockdown OR knockout OR "loss of function") AND "{disease_name}"',
limit=30
)
# Search for biomarker studies
biomarker_papers = tu.tools.PubMed_search_articles(
query=f'"{gene_symbol}" AND (biomarker OR "target engagement" OR "pharmacodynamic")',
limit=20
)
# Mouse phenotypes from OpenTargets (already retrieved in Phase 5)
# Reuse mouse_models data
# CTD gene-disease associations (complementary)
ctd_diseases = tu.tools.CTD_get_gene_diseases(input_terms=gene_symbol)
Functional Studies (0-5):
- CRISPR KO shows disease-relevant phenotype: 5
- siRNA knockdown shows phenotype: 4
- Biochemical assay validates mechanism: 3
- Overexpression study only: 2
- No functional data: 0
Disease Models (0-5):
- Patient-derived xenograft (PDX) response: 5
- Genetically engineered mouse model: 4
- Cell line model: 3
- In silico model only: 1
- No model data: 0
Objective: Leverage structural biology for druggability and mechanism understanding.
# Get PDB entries from UniProt cross-references
uniprot_entry = tu.tools.UniProt_get_entry_by_accession(accession=uniprot_id)
# Parse: uniProtKBCrossReferences where database == "PDB"
# Get details for each PDB
for pdb_id in pdb_ids[:10]:
metadata = tu.tools.get_protein_metadata_by_pdb_id(pdb_id=pdb_id)
quality = tu.tools.pdbe_get_entry_quality(pdb_id=pdb_id)
summary = tu.tools.pdbe_get_entry_summary(pdb_id=pdb_id)
experiment = tu.tools.pdbe_get_entry_experiment(pdb_id=pdb_id)
molecules = tu.tools.pdbe_get_entry_molecules(pdb_id=pdb_id)
alphafold = tu.tools.alphafold_get_prediction(qualifier=uniprot_id)
alphafold_info = tu.tools.alphafold_get_summary(qualifier=uniprot_id)
# Check pLDDT scores for confidence
# ProteinsPlus DoGSiteScorer for best PDB structure
pockets = tu.tools.ProteinsPlus_predict_binding_sites(pdb_id=best_pdb_id)
# Returns: pocket locations, druggability scores, volume, surface
# Interaction diagram for co-crystal structures
if has_ligand:
diagram = tu.tools.ProteinsPlus_generate_interaction_diagram(pdb_id=pdb_id)
# InterPro domains
domains = tu.tools.InterPro_get_protein_domains(uniprot_accession=uniprot_id)
# Domain details for key domains
for domain in domains[:5]:
detail = tu.tools.InterPro_get_domain_details(entry_id=domain['accession'])
Objective: Comprehensive literature analysis with collision-aware search.
# Detect naming collisions before literature search
test_results = tu.tools.PubMed_search_articles(
query=f'"{gene_symbol}"[Title]', limit=20
)
# PubMed returns plain list of dicts
# Check if >20% of results are off-topic (no biology terms)
# If collision detected, add filters: AND (protein OR gene OR receptor OR kinase)
# Total publications
total = tu.tools.PubMed_search_articles(
query=f'"{gene_symbol}" AND (protein OR gene)', limit=1
)
# Check total_count field
# Recent publications (5-year trend)
recent = tu.tools.PubMed_search_articles(
query=f'"{gene_symbol}" AND (protein OR gene) AND ("2021"[PDAT] : "2026"[PDAT])',
limit=50
)
# Drug-focused publications
drug_pubs = tu.tools.PubMed_search_articles(
query=f'"{gene_symbol}" AND (drug OR therapeutic OR inhibitor OR antibody)',
limit=30
)
# EuropePMC for broader coverage
epmc = tu.tools.EuropePMC_search_articles(
query=f'"{gene_symbol}" AND drug target',
limit=30
)
# Reviews for target overview
reviews = tu.tools.PubMed_search_articles(
query=f'"{gene_symbol}" AND drug target AND review[pt]',
limit=10
)
# OpenAlex for citation metrics
openalex_works = tu.tools.openalex_search_works(
query=f'{gene_symbol} drug target', limit=20
)
Objective: Generate actionable recommendations based on all evidence.
This phase synthesizes all previous phases into:
def calculate_validation_score(phase_results):
"""
Calculate Target Validation Score (0-100).
Components:
- Disease Association: 0-30
- Druggability: 0-25
- Safety: 0-20
- Clinical Precedent: 0-15
- Validation Evidence: 0-10
"""
score = {
'disease_genetic': 0, # 0-10
'disease_literature': 0, # 0-10
'disease_pathway': 0, # 0-10
'drug_structural': 0, # 0-10
'drug_chemical': 0, # 0-10
'drug_class': 0, # 0-5
'safety_expression': 0, # 0-5
'safety_genetic': 0, # 0-10
'safety_adverse': 0, # 0-5
'clinical': 0, # 0-15
'validation_functional': 0, # 0-5
'validation_models': 0, # 0-5
}
# ... scoring logic from each phase ...
total = sum(score.values())
if total >= 80:
tier = "Tier 1"
recommendation = "GO - Highly validated target"
elif total >= 60:
tier = "Tier 2"
recommendation = "CONDITIONAL GO - Needs focused validation"
elif total >= 40:
tier = "Tier 3"
recommendation = "CAUTION - Significant validation needed"
else:
tier = "Tier 4"
recommendation = "NO-GO - Consider alternatives"
return total, tier, recommendation, score
File: [TARGET]_[DISEASE]_validation_report.md
# Drug Target Validation Report: [TARGET]
**Target**: [Gene Symbol] ([Full Name])
**Disease Context**: [Disease Name] (if provided)
**Modality**: [Small molecule / Antibody / etc.] (if specified)
**Generated**: [Date]
**Status**: In Progress
---
## Executive Summary
**Target Validation Score**: [XX/100]
**Priority Tier**: [Tier X] - [Description]
**Recommendation**: [GO / CONDITIONAL GO / CAUTION / NO-GO]
**Key Findings**:
- [1-sentence disease association strength with evidence grade]
- [1-sentence druggability assessment]
- [1-sentence safety profile]
- [1-sentence clinical precedent]
**Critical Risks**:
- [Top risk 1]
- [Top risk 2]
---
## Validation Scorecard
| Dimension | Score | Max | Assessment | Key Evidence |
|-----------|-------|-----|------------|--------------|
| **Disease Association** | | 30 | | |
| - Genetic evidence | | 10 | | |
| - Literature evidence | | 10 | | |
| - Pathway evidence | | 10 | | |
| **Druggability** | | 25 | | |
| - Structural tractability | | 10 | | |
| - Chemical matter | | 10 | | |
| - Target class | | 5 | | |
| **Safety Profile** | | 20 | | |
| - Expression selectivity | | 5 | | |
| - Genetic validation | | 10 | | |
| - Known ADRs | | 5 | | |
| **Clinical Precedent** | | 15 | | |
| **Validation Evidence** | | 10 | | |
| - Functional studies | | 5 | | |
| - Disease models | | 5 | | |
| **TOTAL** | **XX** | **100** | **[Tier]** | |
---
## 1. Target Identity
[Researching...]
## 2. Disease Association Evidence
### 2.1 OpenTargets Disease Associations
[Researching...]
### 2.2 GWAS Genetic Evidence
[Researching...]
### 2.3 Constraint Scores (gnomAD)
[Researching...]
### 2.4 Literature Evidence
[Researching...]
## 3. Druggability Assessment
### 3.1 Tractability (OpenTargets)
[Researching...]
### 3.2 Target Classification
[Researching...]
### 3.3 Structural Tractability
[Researching...]
### 3.4 Chemical Probes & Enabling Packages
[Researching...]
## 4. Known Modulators & Chemical Matter
### 4.1 Approved/Clinical Drugs
[Researching...]
### 4.2 ChEMBL Bioactivity
[Researching...]
### 4.3 BindingDB Ligands
[Researching...]
### 4.4 PubChem Bioassays
[Researching...]
### 4.5 Chemical Probes
[Researching...]
## 5. Clinical Precedent
### 5.1 FDA-Approved Drugs
[Researching...]
### 5.2 Clinical Trial Landscape
[Researching...]
### 5.3 Failed Programs & Lessons
[Researching...]
## 6. Safety & Toxicity Profile
### 6.1 OpenTargets Safety Liabilities
[Researching...]
### 6.2 Expression in Critical Tissues
[Researching...]
### 6.3 Knockout Phenotypes
[Researching...]
### 6.4 Known Adverse Events
[Researching...]
### 6.5 Paralog & Off-Target Risks
[Researching...]
## 7. Pathway Context & Network Analysis
### 7.1 Biological Pathways
[Researching...]
### 7.2 Protein-Protein Interactions
[Researching...]
### 7.3 Functional Enrichment
[Researching...]
### 7.4 Pathway Redundancy Assessment
[Researching...]
## 8. Validation Evidence
### 8.1 Target Essentiality (DepMap)
[Researching...]
### 8.2 Functional Studies
[Researching...]
### 8.3 Animal Models
[Researching...]
### 8.4 Biomarker Potential
[Researching...]
## 9. Structural Insights
### 9.1 Experimental Structures (PDB)
[Researching...]
### 9.2 AlphaFold Prediction
[Researching...]
### 9.3 Binding Pocket Analysis
[Researching...]
### 9.4 Domain Architecture
[Researching...]
## 10. Literature Landscape
### 10.1 Publication Metrics
[Researching...]
### 10.2 Key Publications
[Researching...]
### 10.3 Research Trend
[Researching...]
## 11. Validation Roadmap
### 11.1 Recommended Validation Experiments
[Researching...]
### 11.2 Tool Compounds for Testing
[Researching...]
### 11.3 Biomarker Strategy
[Researching...]
### 11.4 Clinical Biomarker Candidates
[Researching...]
### 11.5 Disease Models to Test
[Researching...]
## 12. Risk Assessment
### 12.1 Key Risks
[Researching...]
### 12.2 Mitigation Strategies
[Researching...]
### 12.3 Competitive Landscape
[Researching...]
## 13. Completeness Checklist
[To be populated post-audit...]
## 14. Data Sources & Methodology
[Will be populated as research progresses...]
Before finalizing, verify:
## 13. Completeness Checklist
### Phase Coverage
- [ ] Phase 0: Target disambiguation (all IDs resolved)
- [ ] Phase 1: Disease association (OT + GWAS + gnomAD + literature)
- [ ] Phase 2: Druggability (tractability + class + structure + probes)
- [ ] Phase 3: Chemical matter (ChEMBL + BindingDB + PubChem + drugs)
- [ ] Phase 4: Clinical precedent (FDA + trials + failures)
- [ ] Phase 5: Safety (OT safety + expression + KO + ADRs + paralogs)
- [ ] Phase 6: Pathway context (Reactome + STRING + GO)
- [ ] Phase 7: Validation evidence (DepMap + literature + models)
- [ ] Phase 8: Structural insights (PDB + AlphaFold + pockets + domains)
- [ ] Phase 9: Literature (collision-aware + metrics + key papers)
- [ ] Phase 10: Validation roadmap (score + recommendations)
### Data Quality
- [ ] All scores justified with specific data
- [ ] Evidence grades (T1-T4) assigned to key claims
- [ ] Negative results documented (not left blank)
- [ ] Failed tools with fallbacks documented
- [ ] Source citations for all data points
### Scoring
- [ ] All 12 score components calculated
- [ ] Total score summed correctly
- [ ] Priority tier assigned
- [ ] GO/NO-GO recommendation justified
| Primary Tool | Fallback 1 | Fallback 2 | If All Fail |
|--------------|------------|------------|-------------|
| OpenTargets_get_diseases_phenotypes_* | CTD_get_gene_diseases | PubMed search | Note in report |
| GTEx_get_median_gene_expression (versioned) | GTEx (unversioned) | HPA_search_genes_by_query | Document gap |
| ChEMBL_get_target_activities | BindingDB_get_ligands_by_uniprot | DGIdb_get_gene_info | Note in report |
| gnomad_get_gene_constraints | OpenTargets_get_target_constraint_info_* | - | Note as unavailable |
| Reactome_map_uniprot_to_pathways | OpenTargets_get_target_gene_ontology_* | - | Use GO only |
| STRING_get_protein_interactions | intact_get_interactions | OpenTargets interactions | Note in report |
| ProteinsPlus_predict_binding_sites | alphafold_get_prediction | Literature pockets | Note as limited |
| Tool | Parameters | Notes |
|------|-----------|-------|
| ensembl_lookup_gene | gene_id, species | species="homo_sapiens" REQUIRED; response wrapped in {status, data, url, content_type} |
| OpenTargets_get_*_by_ensemblID | ensemblId | camelCase, NOT ensemblID |
| OpenTargets_get_publications_by_target_ensemblID | entityId | NOT ensemblId |
| OpenTargets_get_associated_drugs_by_target_ensemblID | ensemblId, size | size is REQUIRED |
| OpenTargets_target_disease_evidence | efoId, ensemblId | Both REQUIRED |
| GTEx_get_median_gene_expression | operation, gencode_id | operation="median" REQUIRED |
| HPA_get_rna_expression_by_source | gene_name, source_type, source_name | ALL 3 required |
| PubMed_search_articles | query, limit | Returns plain list, NOT {articles:[]} |
| UniProt_get_function_by_accession | accession | Returns list of strings |
| alphafold_get_prediction | qualifier | NOT uniprot_accession |
| drugbank_get_safety_* | query, case_sensitive, exact_match, limit | ALL required |
| STRING_get_protein_interactions | protein_ids, species | protein_ids is array; species=9606 |
| Reactome_map_uniprot_to_pathways | id | NOT uniprot_id |
| ChEMBL_get_target_activities | target_chembl_id__exact | Note double underscore |
| search_clinical_trials | query_term | REQUIRED parameter |
| gnomad_get_gene_constraints | gene_symbol | NOT gene_id |
| DepMap_get_gene_dependencies | gene_symbol | NOT gene_id |
| BindingDB_get_ligands_by_uniprot | uniprot, affinity_cutoff | affinity in nM |
| Pharos_get_target | gene or uniprot | Both optional but need one |
tools
Onboard and manage Paperclip AI for research-paper knowledge and agent orchestration
development
Perform AI-powered web searches with real-time information using Perplexity models via LiteLLM and OpenRouter. This skill should be used when conducting web searches for current information, finding recent scientific literature, getting grounded answers with source citations, or accessing information beyond the model knowledge cutoff. Provides access to multiple Perplexity models including Sonar Pro, Sonar Pro Search (advanced agentic search), and Sonar Reasoning Pro through a single OpenRouter API key.
testing
Generate a structured scientific PDF report from a JSON description. Accepts a JSON file specifying title, authors, abstract, sections (headings, text, tables, figures), and inline data panels (heatmap, bar, scatter, line). Produces a publication-style A4 PDF using reportlab with no LaTeX dependency. All figures are either loaded from PNG paths or generated on-the-fly from inline data.
development
Execute arbitrary Python code and return stdout. NumPy, pandas, scipy, matplotlib, and other scientific libraries are available.