skills/structural-biology-drug-discovery/opentargets-database/SKILL.md
Query Open Targets GraphQL API for target-disease associations, evidence, drug links, safety. Search targets by gene, diseases by EFO ID; scores from 20+ sources, drug mechanisms, tractability. For ChEMBL use chembl-database-bioactivity; for trials use clinicaltrials-database-search.
npx skillsauth add jaechang-hits/sciagent-skills opentargets-databaseInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Open Targets Platform integrates evidence from genetics, genomics, literature, and drug databases to systematically score target-disease associations for 60,000+ targets and 20,000+ diseases/phenotypes. The public GraphQL API (no authentication required) provides access to association scores, evidence from 20+ data sources (GWAS, ClinVar, ChEMBL, drugs, pathways, mouse models, expression), and detailed drug-target-disease triangles.
chembl-database-bioactivity; for clinical trial details use clinicaltrials-database-searchrequestspip install requests
import requests
OT_URL = "https://api.platform.opentargets.org/api/v4/graphql"
def ot_query(gql, variables=None):
r = requests.post(OT_URL, json={"query": gql, "variables": variables or {}})
r.raise_for_status()
return r.json()["data"]
# Top disease associations for BRCA1
query = """
query TargetDiseases($ensgId: String!) {
target(ensemblId: $ensgId) {
id
approvedSymbol
associatedDiseases(page: {index: 0, size: 5}) {
rows {
disease { id name }
score
}
}
}
}
"""
data = ot_query(query, {"ensgId": "ENSG00000012048"})
target = data["target"]
print(f"Target: {target['approvedSymbol']}")
for row in target["associatedDiseases"]["rows"]:
print(f" {row['disease']['name']}: {row['score']:.3f}")
Search for a target and retrieve basic metadata (Ensembl ID, biotype, description).
import requests
OT_URL = "https://api.platform.opentargets.org/api/v4/graphql"
def ot_query(gql, variables=None):
r = requests.post(OT_URL, json={"query": gql, "variables": variables or {}})
r.raise_for_status()
return r.json()["data"]
# Search by gene symbol
query = """
query SearchTarget($sym: String!) {
search(queryString: $sym, entityNames: ["target"]) {
hits {
id
name
entity
object {
... on Target {
approvedSymbol
approvedName
biotype
functionDescriptions
}
}
}
}
}
"""
data = ot_query(query, {"sym": "BRCA1"})
for hit in data["search"]["hits"][:3]:
obj = hit.get("object", {})
print(f"ID: {hit['id']} | {obj.get('approvedSymbol')} | {obj.get('biotype')}")
descs = obj.get("functionDescriptions", [])
if descs:
print(f" Function: {descs[0][:120]}")
# Direct lookup by Ensembl ID
query2 = """
query Target($ensgId: String!) {
target(ensemblId: $ensgId) {
id approvedSymbol approvedName biotype
tractability { label modality value }
}
}
"""
data2 = ot_query(query2, {"ensgId": "ENSG00000141510"}) # TP53
t = data2["target"]
print(f"\n{t['approvedSymbol']} ({t['id']}): {t['biotype']}")
print("Tractability:")
for tr in t.get("tractability", [])[:5]:
print(f" {tr['modality']} | {tr['label']}: {tr['value']}")
Retrieve association scores for a target across all associated diseases.
import requests, pandas as pd
OT_URL = "https://api.platform.opentargets.org/api/v4/graphql"
def ot_query(gql, variables=None):
r = requests.post(OT_URL, json={"query": gql, "variables": variables or {}})
r.raise_for_status()
return r.json()["data"]
query = """
query Associations($ensgId: String!, $size: Int!) {
target(ensemblId: $ensgId) {
approvedSymbol
associatedDiseases(page: {index: 0, size: $size}, orderByScore: "score") {
count
rows {
disease { id name therapeuticAreas { name } }
score
datatypeScores { id score }
}
}
}
}
"""
data = ot_query(query, {"ensgId": "ENSG00000012048", "size": 20})
target = data["target"]
assoc = target["associatedDiseases"]
print(f"{target['approvedSymbol']}: {assoc['count']} associated diseases")
rows = []
for r in assoc["rows"]:
scores = {d["id"]: d["score"] for d in r.get("datatypeScores", [])}
rows.append({
"disease": r["disease"]["name"],
"disease_id": r["disease"]["id"],
"overall_score": round(r["score"], 4),
"genetics": round(scores.get("genetic_association", 0), 3),
"drugs": round(scores.get("known_drug", 0), 3),
"literature": round(scores.get("literature", 0), 3),
})
df = pd.DataFrame(rows)
print(df.head(10).to_string(index=False))
Given a disease, retrieve all associated targets ranked by score.
import requests, pandas as pd
OT_URL = "https://api.platform.opentargets.org/api/v4/graphql"
def ot_query(gql, variables=None):
r = requests.post(OT_URL, json={"query": gql, "variables": variables or {}})
r.raise_for_status()
return r.json()["data"]
query = """
query DiseaseTargets($efoId: String!, $size: Int!) {
disease(efoId: $efoId) {
id name
associatedTargets(page: {index: 0, size: $size}, orderByScore: "score") {
count
rows {
target { id approvedSymbol biotype }
score
datatypeScores { id score }
}
}
}
}
"""
# EFO_0000305 = breast carcinoma
data = ot_query(query, {"efoId": "EFO_0000305", "size": 10})
disease = data["disease"]
print(f"Disease: {disease['name']}")
print(f"Total associated targets: {disease['associatedTargets']['count']}")
for row in disease["associatedTargets"]["rows"][:5]:
t = row["target"]
print(f" {t['approvedSymbol']:12s} score={row['score']:.3f} biotype={t['biotype']}")
Retrieve approved and investigational drugs, their mechanism, and clinical phase.
import requests, pandas as pd
OT_URL = "https://api.platform.opentargets.org/api/v4/graphql"
def ot_query(gql, variables=None):
r = requests.post(OT_URL, json={"query": gql, "variables": variables or {}})
r.raise_for_status()
return r.json()["data"]
query = """
query KnownDrugs($ensgId: String!) {
target(ensemblId: $ensgId) {
approvedSymbol
drugAndClinicalCandidates {
count
rows {
maxClinicalStage
drug { id name drugType maximumClinicalStage mechanismsOfAction { rows { mechanismOfAction } } }
diseases { disease { id name } }
}
}
}
}
"""
data = ot_query(query, {"ensgId": "ENSG00000146648"}) # EGFR
target = data["target"]
drugs_data = target["drugAndClinicalCandidates"]
print(f"{target['approvedSymbol']}: {drugs_data['count']} drug-indication pairs")
rows = []
for r in drugs_data["rows"]:
drug = r["drug"]
moa = drug.get("mechanismsOfAction") or {}
moa_first = (moa.get("rows") or [{}])[0].get("mechanismOfAction")
first_disease = (r.get("diseases") or [{}])[0].get("disease") or {}
rows.append({
"drug": drug["name"],
"type": drug["drugType"],
"maxClinicalStage": r["maxClinicalStage"],
"approved": drug["maximumClinicalStage"] == "PHASE_4",
"indication": first_disease.get("name", "n/a"),
"mechanism": moa_first,
})
df = pd.DataFrame(rows).drop_duplicates(subset=["drug", "indication"])
print(df.head(10).to_string(index=False))
Retrieve detailed evidence records (GWAS, ClinVar, literature) for a target-disease pair.
import requests
OT_URL = "https://api.platform.opentargets.org/api/v4/graphql"
def ot_query(gql, variables=None):
r = requests.post(OT_URL, json={"query": gql, "variables": variables or {}})
r.raise_for_status()
return r.json()["data"]
query = """
query Evidence($ensgId: String!, $efoId: String!) {
disease(efoId: $efoId) {
evidences(
ensemblIds: [$ensgId]
enableIndirect: true
size: 10
datasourceIds: ["gwas_catalog", "clinvar", "chembl"]
) {
count
rows {
datasourceId
score
variantRsId
studyId
publicationYear
clinicalSignificances
}
}
}
}
"""
data = ot_query(query, {"ensgId": "ENSG00000012048", "efoId": "EFO_0000305"})
evidences = data["disease"]["evidences"]
print(f"Evidence records: {evidences['count']}")
for ev in evidences["rows"][:5]:
print(f" Source: {ev['datasourceId']:20s} | Score: {ev['score']:.3f}")
Retrieve known adverse events and safety data for a target.
import requests
OT_URL = "https://api.platform.opentargets.org/api/v4/graphql"
def ot_query(gql, variables=None):
r = requests.post(OT_URL, json={"query": gql, "variables": variables or {}})
r.raise_for_status()
return r.json()["data"]
query = """
query Safety($ensgId: String!) {
target(ensemblId: $ensgId) {
approvedSymbol
safetyLiabilities {
event
effects { direction dosing }
biosamples { tissueLabel cellLabel }
datasource
}
}
}
"""
data = ot_query(query, {"ensgId": "ENSG00000146648"}) # EGFR
target = data["target"]
print(f"Safety liabilities for {target['approvedSymbol']}:")
for s in target.get("safetyLiabilities", [])[:5]:
print(f" Event: {s['event']}")
# 2025 schema: `datasource` is a scalar literature-citation string,
# not the legacy `datasources[{name, pmid}]` list of objects
print(f" Source: {s.get('datasource', 'n/a')}")
for eff in s.get("effects", []) or []:
print(f" Effect: direction={eff.get('direction')} dosing={eff.get('dosing')}")
Open Targets uses harmonic sum aggregation to combine evidence from multiple data sources into a 0–1 association score. Subscores include: genetic_association, somatic_mutation, known_drug, affected_pathway, literature, RNA_expression, animal_model, and others. Higher scores indicate more and stronger evidence.
Open Targets uses Experimental Factor Ontology (EFO) identifiers for diseases (e.g., EFO_0000305 for breast carcinoma). Search by disease name using the search query to find EFO IDs before querying associations.
Goal: Given a disease, rank all associated targets by overall score and export with evidence breakdown.
import requests, pandas as pd, time
OT_URL = "https://api.platform.opentargets.org/api/v4/graphql"
def ot_query(gql, variables=None):
r = requests.post(OT_URL, json={"query": gql, "variables": variables or {}})
r.raise_for_status()
return r.json()["data"]
def disease_search(name):
q = 'query S($q:String!){search(queryString:$q,entityNames:["disease"]){hits{id name}}}'
data = ot_query(q, {"q": name})
return [(h["id"], h["name"]) for h in data["search"]["hits"][:3]]
def get_top_targets(efo_id, n=50):
q = """
query($efoId:String!,$size:Int!){
disease(efoId:$efoId){
name
associatedTargets(page:{index:0,size:$size},orderByScore:"score"){
count
rows{
target{id approvedSymbol biotype}
score
datatypeScores { id score }
}
}
}
}"""
data = ot_query(q, {"efoId": efo_id, "size": n})
disease = data["disease"]
rows = []
for row in disease["associatedTargets"]["rows"]:
t = row["target"]
scores = {d["id"]: round(d["score"], 3) for d in row.get("datatypeScores", [])}
rows.append({
"target": t["approvedSymbol"],
"ensembl_id": t["id"],
"biotype": t["biotype"],
"overall_score": round(row["score"], 4),
**scores
})
return disease["name"], pd.DataFrame(rows)
# Step 1: Find EFO ID for disease
candidates = disease_search("non-small cell lung carcinoma")
print("Disease candidates:", candidates)
# Step 2: Get top targets
disease_name, df = get_top_targets("EFO_0003060", n=50)
df.to_csv("target_prioritization.csv", index=False)
print(f"\nTop targets for {disease_name}:")
cols = [c for c in ["target", "overall_score", "genetic_association", "known_drug", "literature", "rna_expression", "somatic_mutation"] if c in df.columns]
print(df[cols].head(10).to_string(index=False))
Goal: For a target, retrieve all drugs and their associated indications and phases.
import requests, pandas as pd
OT_URL = "https://api.platform.opentargets.org/api/v4/graphql"
def ot_query(gql, variables=None):
r = requests.post(OT_URL, json={"query": gql, "variables": variables or {}})
r.raise_for_status()
return r.json()["data"]
query = """
query($ensgId:String!){
target(ensemblId:$ensgId){
approvedSymbol
drugAndClinicalCandidates{
count
rows{
maxClinicalStage
drug{id name drugType maximumClinicalStage mechanismsOfAction { rows { mechanismOfAction } } }
diseases { disease { id name } }
}
}
}
}"""
targets = {
"EGFR": "ENSG00000146648",
"ERBB2": "ENSG00000141736",
}
all_rows = []
for sym, ensg in targets.items():
data = ot_query(query, {"ensgId": ensg})
for row in data["target"]["drugAndClinicalCandidates"]["rows"]:
drug = row["drug"]
moa = drug.get("mechanismsOfAction") or {}
moa_first = (moa.get("rows") or [{}])[0].get("mechanismOfAction")
first_disease = (row.get("diseases") or [{}])[0].get("disease") or {}
all_rows.append({
"target": sym,
"drug": drug["name"],
"drug_type": drug["drugType"],
"maxClinicalStage": row["maxClinicalStage"],
"approved": drug["maximumClinicalStage"] == "PHASE_4",
"indication": first_disease.get("name", "n/a"),
"mechanism": moa_first,
})
df = pd.DataFrame(all_rows)
df.to_csv("drug_target_matrix.csv", index=False)
print(df.head(10).to_string(index=False))
| Parameter | Module | Default | Range / Options | Effect |
|-----------|--------|---------|-----------------|--------|
| page.size | Associations | 10 | 1–10000 | Records per page |
| page.index | Associations | 0 | 0–N | Page index for pagination |
| orderByScore | Associations | "score" | "score", component IDs | Sort associations by score |
| datasourceIds | Evidence | all sources | list of datasource IDs | Filter evidence by source |
| enableIndirect | Evidence | false | true/false | Include child disease evidence |
| entityNames | Search | all | ["target"], ["disease"] | Filter search entity type |
Use EFO IDs for diseases: Disease names vary; always use the search query to get the canonical EFO ID before running association queries to avoid name-matching issues.
Paginate for full result sets: Default page size is 10; use page.size: 10000 for complete results, but be aware this can return large payloads.
Filter by datatypeScores: For genetic target validation, filter on genetic_association subscore > 0.1; for drug repurposing, prioritize known_drug subscore.
Use enableIndirect: true in evidence queries to include evidence for disease subtypes (child terms in EFO hierarchy).
Cache GraphQL responses: Open Targets data updates quarterly; cache responses during analysis to avoid redundant API calls.
When to use: Resolve a disease name to the EFO ID needed for association queries.
import requests
OT_URL = "https://api.platform.opentargets.org/api/v4/graphql"
query = """
query($q: String!) {
search(queryString: $q, entityNames: ["disease"]) {
hits { id name score }
}
}"""
r = requests.post(OT_URL, json={"query": query, "variables": {"q": "breast cancer"}})
for hit in r.json()["data"]["search"]["hits"][:5]:
print(f"{hit['id']}: {hit['name']} (score={hit['score']:.3f})")
When to use: Assess whether a target is tractable for small molecules, antibodies, or PROTACs.
import requests
OT_URL = "https://api.platform.opentargets.org/api/v4/graphql"
query = """
query($ensgId: String!) {
target(ensemblId: $ensgId) {
approvedSymbol
tractability { label modality value }
}
}"""
r = requests.post(OT_URL, json={"query": query, "variables": {"ensgId": "ENSG00000141510"}})
t = r.json()["data"]["target"]
print(f"Tractability for {t['approvedSymbol']}:")
for tr in t.get("tractability", []):
if tr["value"]:
print(f" [{tr['modality']}] {tr['label']}")
When to use: Find all approved drugs for a disease with phase 4 evidence.
import requests, pandas as pd
OT_URL = "https://api.platform.opentargets.org/api/v4/graphql"
query = """
query($efoId: String!) {
disease(efoId: $efoId) {
name
drugAndClinicalCandidates { count rows {
maxClinicalStage
drug { name maximumClinicalStage drugType }
}}
}
}"""
r = requests.post(OT_URL, json={"query": query, "variables": {"efoId": "EFO_0000305"}})
data = r.json()["data"]["disease"]
approved = [row for row in data["drugAndClinicalCandidates"]["rows"] if row["drug"]["maximumClinicalStage"] == "PHASE_4"]
print(f"Approved drugs for {data['name']}: {len(approved)}")
for row in approved[:5]:
print(f" {row['drug']['name']} ({row['drug']['drugType']}) maxStage={row['maxClinicalStage']}")
| Problem | Cause | Solution |
|---------|-------|----------|
| HTTP 400 with GraphQL error | Malformed query or invalid field name | Check query against GraphQL schema at https://api.platform.opentargets.org/api/v4/graphql |
| Empty rows in associations | EFO ID not recognized | Use search query to find correct EFO ID |
| Target not found | Gene symbol vs Ensembl ID mismatch | Use search query first to resolve Ensembl ID |
| Slow query for large result set | page.size too large | Cap at 500 rows; paginate with multiple requests |
| Missing tractability data | Target not assessed | Not all targets have tractability; check tractability field is non-null |
| drugAndClinicalCandidates empty | No drug-target evidence in ChEMBL | Use chembl-database-bioactivity for preclinical compound activity |
chembl-database-bioactivity — Bioactivity IC50/Ki data for compounds against targetsclinicaltrials-database-search — Detailed clinical trial information for drugs found via Open Targetsensembl-database — Ensembl IDs and variant annotations needed as input to Open Targets queriesstring-database-ppi — Protein-protein interaction networks to contextualize target biologytools
Fast short-read DNA aligner for WGS/WES/ChIP-seq. 2× faster BWA-MEM successor; outputs SAM/BAM with read group headers for GATK. Primary plus supplementary records for chimeric reads. Use STAR for RNA-seq splice-aware alignment; Bowtie2 is a comparable alternative.
tools
smina molecular docking CLI. AutoDock Vina fork with customizable scoring functions, native SDF/MOL2/PDB ligand input, autoboxing, local energy minimization, and per-atom score breakdowns. Pipeline: receptor PDBQT prep -> ligand prep (RDKit/OpenBabel) -> dock via autobox or explicit grid -> rescore/minimize with custom scoring -> rank poses by affinity. Choose smina over Vina when you need custom scoring terms (--custom_scoring), local optimization of an existing pose (--local_only), per-atom contributions (--atom_term_data), or SDF/MOL2 ligands without manual PDBQT conversion. For unknown binding sites use diffdock-blind-docking; for the Python-bindings/Vinardo workflow use autodock-vina-docking.
development
mdtraj molecular dynamics trajectory analysis (Python). Reads DCD/XTC/TRR/NetCDF/H5/PDB topologies and trajectories; computes RMSD vs time, radius of gyration, per-residue RMSF, residue-residue contact frequency maps, phi/psi torsions for Ramachandran plots (general + Gly/Pro), and 8-state DSSP secondary structure. Modules: trajectory I/O, geometry (distances/angles/dihedrals), structural analysis (RMSD/Rg/RMSF/SASA), contacts, hydrogen bonds, secondary structure (DSSP), NMR observables. For broader atom-selection grammar use mdanalysis-trajectory; for running MD simulations use OpenMM/GROMACS.
development
Programmatic PubMed access via NCBI E-utilities REST API. Covers Boolean/MeSH queries, field-tagged search, endpoints (ESearch, EFetch, ESummary, EPost, ELink), history server for batches, citation matching, systematic review strategies. Use for biomedical literature search or automated pipelines.