skills/systems-biology-multiomics/reactome-database/SKILL.md
Query Reactome pathways via REST: pathway queries, entity lookup, keyword search, gene list enrichment, hierarchy, cross-refs. Content + Analysis services. Python wrapper: reactome2py. For KEGG use kegg-database; for PPIs use string-database-ppi.
npx skillsauth add jaechang-hits/scicraft reactome-databaseInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reactome is an open-source, curated database of biological pathways and reactions for 16+ species. It provides two REST APIs: the Content Service for querying pathway data, entities, and hierarchy, and the Analysis Service for gene/protein list enrichment and expression data overlay. All endpoints return JSON (default) or other formats and require no authentication.
kegg-database insteadstring-database-ppi insteadreactome2py (pip install reactome2py)pip install requests
API constraints:
time.sleep(0.5) between batch requests to be respectfulhttps://reactome.org/ContentServicehttps://reactome.org/AnalysisServiceimport requests
import time
CONTENT = "https://reactome.org/ContentService"
ANALYSIS = "https://reactome.org/AnalysisService"
def reactome_get(base, path, params=None):
"""Generic Reactome REST API caller. Returns JSON or raises."""
resp = requests.get(f"{base}{path}", params=params)
resp.raise_for_status()
try:
return resp.json()
except ValueError:
return resp.text
# Check database version
version = reactome_get(CONTENT, "/data/database/version")
print(f"Reactome version: {version}")
# Query a pathway
pathway = reactome_get(CONTENT, "/data/query/R-HSA-69620")
print(f"Pathway: {pathway['displayName']}")
print(f"Species: {pathway['speciesName']}")
time.sleep(0.5)
# Search for pathways
results = reactome_get(CONTENT, "/search/query", params={"query": "apoptosis", "types": "Pathway"})
print(f"Found {results['found']} results for 'apoptosis'")
Retrieve detailed information about pathways, reactions, and biological entities by stable ID. Uses reactome_get helper from Quick Start.
# Query pathway by stable ID
pathway = reactome_get(CONTENT, "/data/query/R-HSA-69620")
print(f"Name: {pathway['displayName']}")
print(f"Stable ID: {pathway['stId']}, Species: {pathway['speciesName']}")
print(f"Schema class: {pathway['schemaClass']}") # Pathway, TopLevelPathway, etc.
time.sleep(0.5)
# Get participating physical entities in a pathway
entities = reactome_get(CONTENT, f"/data/participants/{pathway['stId']}")
print(f"\nParticipating entities: {len(entities)}")
for e in entities[:3]:
print(f" {e['displayName']} ({e['schemaClass']})")
time.sleep(0.5)
# Get participating molecules with reference entities (UniProt, ChEBI, etc.)
refs = reactome_get(CONTENT, f"/data/participants/{pathway['stId']}/referenceEntities")
print(f"\nReference entities: {len(refs)}")
for r in refs[:3]:
print(f" {r['displayName']} — {r.get('databaseName', 'N/A')}:{r.get('identifier', 'N/A')}")
Search across Reactome by keyword with faceted filtering.
# Keyword search filtered to Pathways
results = reactome_get(CONTENT, "/search/query", params={
"query": "cell cycle",
"types": "Pathway",
"species": "Homo sapiens",
"cluster": "true"
})
print(f"Total found: {results['found']}")
for entry in results.get("results", [])[:1]:
for e in entry.get("entries", [])[:5]:
print(f" {e['stId']}: {e['name']}")
time.sleep(0.5)
# Search for proteins/complexes
proteins = reactome_get(CONTENT, "/search/query", params={
"query": "TP53", "types": "Protein", "species": "Homo sapiens"
})
print(f"\nTP53 protein entries: {proteins['found']}")
time.sleep(0.5)
# Suggest (autocomplete)
suggestions = reactome_get(CONTENT, "/search/suggest", params={"query": "apopt"})
print(f"Suggestions: {suggestions}")
Searchable types: Pathway, Reaction, Protein, Complex, SmallMolecule, Gene, DNA, RNA, Drug, ReferenceEntity
Submit a gene/protein list for over-representation analysis against Reactome pathways.
import requests
import time
ANALYSIS = "https://reactome.org/AnalysisService"
# Gene list (newline-separated identifiers — UniProt, HGNC symbols, Ensembl, etc.)
gene_list = "TP53\nBRCA1\nBRCA2\nATM\nCHEK2\nCDK2\nRB1\nMDM2\nCDKN1A\nBAX"
# Submit for enrichment (POST with text body)
resp = requests.post(
f"{ANALYSIS}/identifiers/",
headers={"Content-Type": "text/plain"},
data=gene_list,
params={"pageSize": 10, "page": 1, "sortBy": "ENTITIES_FDR", "order": "ASC"}
)
resp.raise_for_status()
result = resp.json()
print(f"Analysis token: {result['summary']['token']}")
print(f"Pathways found: {result['pathwaysFound']}")
print(f"Identifiers found: {result['identifiersNotFound']}")
print(f"\nTop enriched pathways:")
for p in result["pathways"][:5]:
print(f" {p['stId']}: {p['name']}")
print(f" FDR: {p['entities']['fdr']:.2e}, "
f"Found: {p['entities']['found']}/{p['entities']['total']}")
time.sleep(0.5)
Analysis accepts: newline-separated identifiers, or tab-separated with expression values (for expression overlay). Supported IDs include UniProt, HGNC symbols, Ensembl, NCBI Gene, ChEBI, miRBase, KEGG, and more.
Retrieve previously computed analysis results by token and apply filters.
import requests
import time
ANALYSIS = "https://reactome.org/AnalysisService"
# Re-fetch results by token (from a previous analysis)
token = "MjAyNTA2MTcxMDA3MzRfMQ%3D%3D" # example — use token from Module 3
# Get results with filtering
results = requests.get(f"{ANALYSIS}/token/{token}", params={
"pageSize": 20,
"page": 1,
"sortBy": "ENTITIES_FDR",
"species": "Homo sapiens",
"resource": "TOTAL" # TOTAL, UNIPROT, ENSEMBL, CHEBI, etc.
})
results.raise_for_status()
data = results.json()
print(f"Token: {data['summary']['token']}")
print(f"Pathways: {data['pathwaysFound']}")
time.sleep(0.5)
# Get identifiers found in a specific pathway
pathway_detail = requests.get(
f"{ANALYSIS}/token/{token}/found/all/{data['pathways'][0]['stId']}"
)
pathway_detail.raise_for_status()
found = pathway_detail.json()
print(f"\nIdentifiers found in {data['pathways'][0]['name']}:")
for entity in found.get("entities", [])[:5]:
mapsTo = [m["identifier"] for m in entity.get("mapsTo", [])]
print(f" {entity['id']} -> {mapsTo}")
Token persistence: analysis tokens are valid for several hours. Share tokens to let collaborators view the same results without re-running. Filter by resource (TOTAL, UNIPROT, ENSEMBL, CHEBI, etc.) and species.
Navigate the Reactome pathway hierarchy from top-level pathways down to reactions.
# Top-level pathways for human (9606 = NCBI taxonomy ID)
top = reactome_get(CONTENT, "/data/pathways/top/9606")
print(f"Top-level human pathways: {len(top)}")
for p in top[:5]:
print(f" {p['stId']}: {p['displayName']}")
time.sleep(0.5)
# Get contained events (sub-pathways and reactions)
events = reactome_get(CONTENT, "/data/pathway/R-HSA-69620/containedEvents")
print(f"\nContained events in Cell Cycle: {len(events)}")
for e in events[:5]:
print(f" {e['stId']}: {e['displayName']} ({e['schemaClass']})")
time.sleep(0.5)
# Get the full ancestor chain for a pathway
ancestors = reactome_get(CONTENT, "/data/event/R-HSA-69620/ancestors")
print(f"\nAncestors of Cell Cycle:")
for chain in ancestors:
names = [a["displayName"] for a in chain]
print(f" {' > '.join(names)}")
Species identifiers: use NCBI taxonomy IDs (9606=human, 10090=mouse, 10116=rat) or species names.
Map identifiers across databases and query species-specific data.
# List all species in Reactome
species = reactome_get(CONTENT, "/data/species/all")
print(f"Species in Reactome: {len(species)}")
for s in species[:5]:
print(f" {s['displayName']} (taxId: {s['taxId']})")
time.sleep(0.5)
# Map a Reactome entity to external references
xrefs = reactome_get(CONTENT, "/data/query/R-HSA-69620/xrefs")
if isinstance(xrefs, list):
print(f"\nCross-references for R-HSA-69620: {len(xrefs)}")
for x in xrefs[:5]:
print(f" {x}")
time.sleep(0.5)
# Get orthologous pathway in another species (human → mouse)
mouse_ortho = reactome_get(CONTENT, "/data/orthology/R-HSA-69620/species/10090")
if mouse_ortho:
for o in mouse_ortho[:3]:
print(f"Mouse ortholog: {o['stId']}: {o['displayName']}")
Reactome organizes knowledge in a hierarchical structure:
| Level | Schema Class | Example |
|-------|-------------|---------|
| Top-Level Pathway | TopLevelPathway | Cell Cycle, Immune System, Metabolism |
| Pathway | Pathway | Cell Cycle Checkpoints, Mitotic G1-G1/S phases |
| Reaction | Reaction | TP53 binds RB1 |
| Physical Entity | EntityWithAccessionedSequence | TP53 [cytosol] |
Pathways contain sub-pathways and reactions. Reactions connect input/output physical entities. Each entity maps to reference databases (UniProt, ChEBI, Ensembl).
The Analysis Service accepts a wide range of identifiers:
| Database | Example ID | Type | |----------|-----------|------| | UniProt | P04637 | Protein | | HGNC Symbol | TP53 | Gene symbol | | Ensembl Gene | ENSG00000141510 | Gene | | NCBI Gene | 7157 | Gene | | ChEBI | CHEBI:15377 | Small molecule | | miRBase | hsa-miR-21-5p | microRNA | | KEGG Gene | hsa:7157 | Gene (KEGG format) | | Ensembl Protein | ENSP00000269305 | Protein |
When you submit an analysis, Reactome returns a token — a URL-safe string that identifies your result set. Tokens enable:
GET /token/{token})https://reactome.org/PathwayBrowser/#/DTAB=AN&ANALYSIS={token}Goal: Submit a gene list, get enriched pathways, and explore top hits.
import requests
import time
CONTENT = "https://reactome.org/ContentService"
ANALYSIS = "https://reactome.org/AnalysisService"
# Step 1: Submit gene list
genes = "TP53\nBRCA1\nBRCA2\nATM\nCHEK2\nCDK2\nRB1\nMDM2\nCDKN1A\nBAX"
resp = requests.post(
f"{ANALYSIS}/identifiers/",
headers={"Content-Type": "text/plain"},
data=genes,
params={"pageSize": 5, "sortBy": "ENTITIES_FDR", "order": "ASC"}
)
resp.raise_for_status()
result = resp.json()
token = result["summary"]["token"]
print(f"Token: {token} | Pathways found: {result['pathwaysFound']}")
# Step 2: Show top pathways with FDR
for p in result["pathways"][:5]:
fdr = p["entities"]["fdr"]
ratio = f"{p['entities']['found']}/{p['entities']['total']}"
print(f" {p['stId']}: {p['name']} (FDR={fdr:.2e}, {ratio})")
time.sleep(0.5)
# Step 3: Get details on top pathway
top_id = result["pathways"][0]["stId"]
detail = requests.get(f"{CONTENT}/data/query/{top_id}").json()
print(f"\nTop pathway: {detail['displayName']}")
print(f"Compartments: {[c['displayName'] for c in detail.get('compartment', [])]}")
Goal: Navigate from a top-level pathway down to specific reactions and entities.
# Uses reactome_get helper and CONTENT base URL from Quick Start
# Step 1: Find pathway by search
results = reactome_get(CONTENT, "/search/query",
params={"query": "DNA repair", "types": "Pathway", "species": "Homo sapiens"})
top_hit = results["results"][0]["entries"][0]
pid = top_hit["stId"]
print(f"Found: {pid} — {top_hit['name']}")
time.sleep(0.5)
# Step 2: Get sub-events
events = reactome_get(CONTENT, f"/data/pathway/{pid}/containedEvents")
reactions = [e for e in events if e["schemaClass"] == "Reaction"]
subpaths = [e for e in events if "Pathway" in e["schemaClass"]]
print(f"Sub-pathways: {len(subpaths)}, Reactions: {len(reactions)}")
time.sleep(0.5)
# Step 3: Get participating molecules for a reaction
if reactions:
rxn = reactions[0]
refs = reactome_get(CONTENT, f"/data/participants/{rxn['stId']}/referenceEntities")
print(f"\n{rxn['displayName']} participants:")
for r in refs[:5]:
print(f" {r.get('databaseName', '?')}:{r.get('identifier', '?')} — {r['displayName']}")
Goal: Submit expression values alongside identifiers for pathway-level expression overlay.
import requests
ANALYSIS = "https://reactome.org/AnalysisService"
# Tab-separated: identifier \t expression_value1 \t expression_value2 ...
# First line can be a header (auto-detected)
expression_data = """#id\tcontrol\ttreated
TP53\t1.2\t3.5
BRCA1\t2.1\t1.8
CDK2\t0.9\t4.2
RB1\t1.5\t0.6
MDM2\t1.0\t2.8
CDKN1A\t0.8\t5.1
BAX\t1.1\t3.9"""
resp = requests.post(
f"{ANALYSIS}/identifiers/",
headers={"Content-Type": "text/plain"},
data=expression_data,
params={"pageSize": 10, "sortBy": "ENTITIES_FDR"}
)
resp.raise_for_status()
result = resp.json()
print(f"Expression columns: {result['summary'].get('sampleName', 'N/A')}")
print(f"Token: {result['summary']['token']}")
for p in result["pathways"][:3]:
exp = p["entities"].get("exp", [])
print(f" {p['name']}: FDR={p['entities']['fdr']:.2e}, expr={exp}")
| Parameter | Function/Endpoint | Default | Options | Effect |
|-----------|-------------------|---------|---------|--------|
| query | /search/query | — | Any string | Keyword search term |
| types | /search/query | All | Pathway, Reaction, Protein, etc. | Filter search by schema class |
| species | /search/query, analysis | All | Species name or taxon ID | Restrict to organism |
| pageSize | Analysis, search | 20 | 1-250 | Results per page |
| sortBy | Analysis | ENTITIES_PVALUE | ENTITIES_FDR, ENTITIES_PVALUE, ENTITIES_FOUND, NAME | Sort enrichment results |
| resource | Analysis filtering | TOTAL | TOTAL, UNIPROT, ENSEMBL, CHEBI, etc. | Filter by identifier source |
| cluster | /search/query | true | true, false | Group search results by type |
Use time.sleep(0.5) between sequential requests: Reactome has no documented hard rate limit, but rapid-fire requests may be throttled. Be courteous to the shared resource.
Save and reuse analysis tokens: Tokens remain valid for hours. Store the token to re-filter results by species or resource without re-submitting.
Prefer stable IDs over database IDs: Reactome stable IDs (R-HSA-69620) are permanent. Internal database IDs can change between releases.
Use sortBy=ENTITIES_FDR for enrichment results: FDR-corrected p-values are more reliable than raw p-values for pathway-level significance.
Check identifiersNotFound in analysis results: a high unmapped count may indicate wrong identifier type or outdated IDs.
import requests
CONTENT = "https://reactome.org/ContentService"
pathway_id = "R-HSA-69620" # Cell Cycle
refs = requests.get(f"{CONTENT}/data/participants/{pathway_id}/referenceEntities").json()
genes = set()
for r in refs:
if r.get("databaseName") == "UniProt":
genes.add(r.get("displayName", r.get("identifier")))
print(f"UniProt proteins in {pathway_id}: {len(genes)}")
for g in sorted(genes)[:10]:
print(f" {g}")
# Generate a direct link to the Reactome pathway diagram
pathway_id = "R-HSA-69620"
diagram_url = f"https://reactome.org/PathwayBrowser/#/{pathway_id}"
print(f"View diagram: {diagram_url}")
# With analysis overlay
token = "YOUR_TOKEN"
overlay_url = f"https://reactome.org/PathwayBrowser/#/{pathway_id}&DTAB=AN&ANALYSIS={token}"
print(f"View with analysis: {overlay_url}")
import requests
import time
CONTENT = "https://reactome.org/ContentService"
pathway_ids = ["R-HSA-69620", "R-HSA-109581", "R-HSA-1640170"]
summaries = []
for pid in pathway_ids:
resp = requests.get(f"{CONTENT}/data/query/{pid}")
resp.raise_for_status()
data = resp.json()
summaries.append({
"stId": data["stId"],
"name": data["displayName"],
"species": data["speciesName"],
"hasDiagram": data.get("hasDiagram", False)
})
time.sleep(0.5)
for s in summaries:
print(f"{s['stId']}: {s['name']} (diagram: {s['hasDiagram']})")
| Problem | Cause | Solution |
|---------|-------|----------|
| 404 Not Found | Invalid stable ID or wrong species prefix | Verify ID format: R-HSA-{number} for human; use /search/query to find valid IDs |
| 400 Bad Request | Malformed POST body or wrong Content-Type | Use Content-Type: text/plain for analysis; newline-separated identifiers |
| Empty analysis results | Identifiers not recognized | Check identifiersNotFound; try different ID types (UniProt vs HGNC symbol) |
| 500 Internal Server Error | Server-side issue or very large input | Retry after delay; split large gene lists (>2000 IDs) into batches |
| Token expired | Analysis results no longer available | Re-submit the gene list; tokens last several hours |
| Wrong species results | No species filter applied | Add species=Homo sapiens parameter to search/analysis |
| Slow response | Large pathway with many entities | Use pageSize to paginate; cache results locally |
| Cross-reference returns empty | Entity has no external DB mapping | Not all Reactome entities have UniProt/Ensembl mappings; check entity schema class |
This skill consolidates content from:
bioservices.Reactometools
Fast short-read DNA aligner for WGS/WES/ChIP-seq. 2× faster BWA-MEM successor; outputs SAM/BAM with read group headers for GATK. Primary plus supplementary records for chimeric reads. Use STAR for RNA-seq splice-aware alignment; Bowtie2 is a comparable alternative.
tools
smina molecular docking CLI. AutoDock Vina fork with customizable scoring functions, native SDF/MOL2/PDB ligand input, autoboxing, local energy minimization, and per-atom score breakdowns. Pipeline: receptor PDBQT prep -> ligand prep (RDKit/OpenBabel) -> dock via autobox or explicit grid -> rescore/minimize with custom scoring -> rank poses by affinity. Choose smina over Vina when you need custom scoring terms (--custom_scoring), local optimization of an existing pose (--local_only), per-atom contributions (--atom_term_data), or SDF/MOL2 ligands without manual PDBQT conversion. For unknown binding sites use diffdock-blind-docking; for the Python-bindings/Vinardo workflow use autodock-vina-docking.
development
mdtraj molecular dynamics trajectory analysis (Python). Reads DCD/XTC/TRR/NetCDF/H5/PDB topologies and trajectories; computes RMSD vs time, radius of gyration, per-residue RMSF, residue-residue contact frequency maps, phi/psi torsions for Ramachandran plots (general + Gly/Pro), and 8-state DSSP secondary structure. Modules: trajectory I/O, geometry (distances/angles/dihedrals), structural analysis (RMSD/Rg/RMSF/SASA), contacts, hydrogen bonds, secondary structure (DSSP), NMR observables. For broader atom-selection grammar use mdanalysis-trajectory; for running MD simulations use OpenMM/GROMACS.
development
Programmatic PubMed access via NCBI E-utilities REST API. Covers Boolean/MeSH queries, field-tagged search, endpoints (ESearch, EFetch, ESummary, EPost, ELink), history server for batches, citation matching, systematic review strategies. Use for biomedical literature search or automated pipelines.