database-access/ensembl-rest/SKILL.md
Query the Ensembl REST API for gene/transcript/protein lookup, sequence retrieval, comparative genomics (Compara), variant effect prediction (VEP), regulatory features, and cross-species ortholog/paralog calls. Use when pulling Ensembl-native data (Ensembl Gene IDs, version-pinned releases, archive endpoints for reproducibility), gene/transcript/exon structure with stable IDs, or VEP for variant annotation. Encodes the 15 req/sec rate limit, archive (e110.rest.ensembl.org) for reproducibility, Ensembl divisions (vertebrates / plants / fungi / metazoa / bacteria), and the symbol-vs-ID stability problem.
npx skillsauth add GPTomics/bioSkills bio-ensembl-restInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference examples tested with: requests 2.31+, Ensembl REST API (release 110+); Ensembl release schedule is roughly quarterly
Before using code patterns, verify installed versions match. If versions differ:
pip show requestsEach Ensembl release has an archive REST endpoint (e.g. https://e110.rest.ensembl.org) for reproducibility.
"Pull Ensembl-native gene / transcript / variant data programmatically" -> Ensembl REST is distinct from NCBI Entrez and BioMart. It is the right answer for: stable Ensembl IDs, transcript / exon structure, VEP (Variant Effect Predictor) annotation, Compara orthologs at vertebrate scale, regulatory feature annotation, and any workflow rooted in Ensembl's coordinate system.
Two facts dominate Ensembl REST work: (1) the 15 req/sec / 55,000 req/hour rate limit — high enough for hundreds of queries, low enough that bulk work (>5,000) belongs in BioMart instead; (2) versioned archive endpoints — https://e110.rest.ensembl.org pins to release 110 for reproducibility, while https://rest.ensembl.org follows the current release.
requests.get('https://rest.ensembl.org/...')biomaRt for bulk (see biomart-queries); REST via httrimport requests
import time
BASE = 'https://rest.ensembl.org'
HEADERS = {'Accept': 'application/json'}
SLEEP = 0.07 # 15 req/sec ceiling
No API key required. Respect Retry-After header on 429.
Ensembl is divided by clade. Different REST hosts:
| Division | Host | Scope | |---|---|---| | Vertebrates | https://rest.ensembl.org | Human, mouse, fish, etc. (the "main" Ensembl) | | Plants | https://rest.ensembl.org (plants division also accessible) | Arabidopsis, rice, etc. via Ensembl Genomes | | Fungi | https://rest.ensemblgenomes.org | Yeasts, Aspergillus, etc. | | Metazoa | https://rest.ensemblgenomes.org | Insects, nematodes, etc. | | Bacteria | https://rest.ensemblgenomes.org | Limited (most bacteria in NCBI) |
For non-vertebrate work, check ensemblgenomes.org mirrors. As of 2024, Ensembl Genomes was being consolidated; check current host.
| URL | Behavior |
|---|---|
| https://rest.ensembl.org | Current release (rolling) |
| https://e110.rest.ensembl.org | Pinned to release 110 |
| https://e111.rest.ensembl.org | Pinned to release 111 |
| https://grch37.rest.ensembl.org | Pinned to GRCh37 (legacy assembly) |
For any published analysis, pin the release. Ensembl releases change gene model versions, exon coordinates, and transcript annotations — re-running a pipeline a year later against the live endpoint may produce different results.
| Group | Example | Purpose |
|---|---|---|
| Lookup | /lookup/symbol/human/BRCA1 | Resolve symbol or ID to stable record |
| Sequence | /sequence/id/{id} | DNA/protein sequence for ID |
| Cross References | /xrefs/symbol/human/BRCA1 | Cross-refs to other DBs |
| Homology / Compara | /homology/symbol/human/BRCA1 | Orthologs and paralogs |
| Gene Tree | /genetree/id/{tree_id} | Compara gene tree |
| VEP | /vep/human/region/{region}/{allele} | Variant effect prediction |
| Overlap | /overlap/id/{id} or /overlap/region/{region} | Genes/regulatory in interval |
| Regulatory | /regulatory/species/{species}/feature/{id} | Regulatory features |
| Variant | /variation/{species}/{id} | dbSNP / 1000G / ClinVar via Ensembl |
| LD | /ld/{species}/pairwise/{var1}/{var2} | LD between variants |
| GA4GH | /ga4gh/... | GA4GH-compliant subset |
Full reference: https://rest.ensembl.org (interactive).
Gene symbols are unstable (MARCH1 -> MARCHF1 in 2020 due to Excel autocorrect; SEPT* family also renamed). Ensembl Gene IDs (ENSG...) are stable across releases when the gene model is preserved.
Best practice:
/lookup/symbol/{species}/{symbol}.Symbol-based endpoints are convenient for interactive use; ID-based endpoints are for reproducible pipelines.
| Limit | Value | |---|---| | Burst | 15 req/sec | | Hourly | 55,000 req/hour | | Concurrent | Not enforced; courtesy 1-2 |
Respect Retry-After header on HTTP 429. For >5,000 queries, switch to BioMart bulk export (see biomart-queries) — BioMart has separate, more permissive limits.
VEP via REST is the right call for ad hoc variant annotation. For batch variant annotation (>1000 variants), download VEP and run locally (variant-calling/variant-annotation skill).
REST modes:
/vep/{species}/region/{region}/{allele} — single variant by coordinate/vep/{species}/id/{variant_id} — by dbSNP / Ensembl variant ID/vep/{species}/hgvs/{hgvs_notation} — by HGVS notationVEP returns rich annotation: consequence (missense, synonymous, intron), SIFT/PolyPhen scores, gnomAD frequencies (if available), ClinVar significance.
Compara orthology calls via Ensembl REST are covered in detail in ortholog-inference (database-access view). The relevant endpoint:
/homology/symbol/{species}/{symbol} — all orthologs across Ensembl species/homology/id/{ensembl_id} — same, by ID?target_species= to restrict to one target?type=orthologues or paralogues to filterGoal: Resolve a gene symbol to its current Ensembl Gene ID once, then use the ID for all downstream queries.
Approach: /lookup/symbol/{species}/{symbol} returns the canonical record.
Reference (requests 2.31+):
import requests
import time
BASE = 'https://rest.ensembl.org'
HEADERS = {'Accept': 'application/json'}
def get_with_retry(url, params=None, max_retries=3):
for attempt in range(max_retries):
r = requests.get(url, params=params, headers=HEADERS)
if r.status_code == 429:
time.sleep(int(r.headers.get('Retry-After', '5')))
continue
r.raise_for_status()
return r
raise RuntimeError(f'Failed after {max_retries} retries')
def symbol_to_ensembl(species, symbol):
r = get_with_retry(f'{BASE}/lookup/symbol/{species}/{symbol}')
return r.json()
info = symbol_to_ensembl('human', 'BRCA1')
print(f' Ensembl Gene ID: {info["id"]}')
print(f' Biotype: {info["biotype"]}')
print(f' Chromosome: {info["seq_region_name"]}:{info["start"]}-{info["end"]}')
print(f' Strand: {info["strand"]}')
def get_sequence(ensembl_id, seq_type='cdna'):
'''seq_type: cdna, cds, protein, genomic'''
r = get_with_retry(f'{BASE}/sequence/id/{ensembl_id}',
params={'type': seq_type, 'content-type': 'application/json'})
return r.json()
prot = get_sequence('ENSG00000139618', seq_type='protein')
print(f' Length: {len(prot["seq"])} aa')
def genes_in_region(species, region):
'''region as "chr:start-end" e.g. "17:43000000-44000000".'''
r = get_with_retry(f'{BASE}/overlap/region/{species}/{region}',
params={'feature': 'gene'})
return r.json()
for g in genes_in_region('human', '17:43000000-43200000'):
print(f' {g["external_name"]:<12} {g["id"]} {g["biotype"]:<20} {g["start"]}-{g["end"]}')
Goal: Get full annotation for a variant by coordinate.
Approach: /vep/{species}/region/{region}/{allele} returns transcript consequences, SIFT/PolyPhen, frequencies.
Reference (Ensembl REST release 110+):
def vep_region(species, region, allele):
r = get_with_retry(f'{BASE}/vep/{species}/region/{region}/{allele}')
return r.json()
# BRCA1 missense variant in GRCh38 coordinates (rest.ensembl.org defaults to GRCh38);
# for GRCh37 coords use https://grch37.rest.ensembl.org instead.
results = vep_region('human', '17:43044295-43044295:1', 'A')
if results:
for tc in results[0].get('transcript_consequences', [])[:5]:
print(f' {tc["gene_symbol"]:<8} {tc["consequence_terms"]}')
if 'sift_prediction' in tc:
print(f' SIFT: {tc["sift_prediction"]} ({tc.get("sift_score", "?")})')
def orthologs(species, symbol, target_species=None):
params = {'type': 'orthologues'}
if target_species:
params['target_species'] = target_species
r = get_with_retry(f'{BASE}/homology/symbol/{species}/{symbol}', params=params)
return r.json()['data'][0]['homologies']
for o in orthologs('human', 'BRCA1', target_species='mouse'):
print(f' {o["target"]["species"]:<15} {o["target"]["id"]} type={o["type"]} confidence={o.get("confidence")}')
def batch_symbols(species, symbols):
out = {}
for sym in symbols:
try:
out[sym] = symbol_to_ensembl(species, sym)
except requests.HTTPError as e:
out[sym] = {'error': str(e)}
time.sleep(0.07) # 15 req/sec ceiling
return out
# Pin to release 110
ARCHIVE = 'https://e110.rest.ensembl.org'
r = requests.get(f'{ARCHIVE}/lookup/symbol/human/BRCA1', headers={'Accept': 'application/json'})
print(r.json()['id'])
# Re-runs against e110 in 2030 will return the same Gene ID even if the live release has moved on.
def ld_pairwise(species, var1, var2, population='1000GENOMES:phase_3:CEU'):
r = get_with_retry(f'{BASE}/ld/{species}/pairwise/{var1}/{var2}',
params={'population_name': population})
return r.json()
/lookup/symbol/human/MARCH1 after the 2020 rename.https://rest.ensembl.org a year later.https://e110.rest.ensembl.org) for reproducibility.variant-calling/variant-annotation skill); reserve REST for ad hoc <1K variants.Homo_sapiens instead of human, or arbitrary capitalization.https://rest.ensembl.org/info/species to enumerate valid names.rest.ensembl.org.rest.ensemblgenomes.org) as of 2024 (consolidation ongoing).info/divisions to confirm.| Error / symptom | Cause | Solution |
|---|---|---|
| 404 on symbol lookup | HGNC rename or wrong species | Resolve to Ensembl ID; check species code |
| HTTP 429 | Rate limit | Sleep per Retry-After; cap to 15 req/sec |
| Different results 6 months later | No version pinning | Use archive endpoint (eXX.rest.ensembl.org) |
| 404 on non-vertebrate | Wrong host division | Switch to rest.ensemblgenomes.org |
| VEP infeasible bulk | REST is per-variant | Local VEP for bulk |
| Old archive 503 | Decommissioned | Use a current archive release |
development
Find restriction enzyme cut sites in DNA sequences using Biopython Bio.Restriction. Search with single enzymes, batches of enzymes, or commercially available enzyme sets. Returns cut positions for linear or circular DNA. Use when finding restriction enzyme cut sites in sequences.
development
Create restriction maps showing enzyme cut positions on DNA sequences using Biopython Bio.Restriction. Visualize cut sites, calculate distances between sites, and generate text or graphical maps. Use when creating or analyzing restriction maps.
development
Analyze restriction digest fragments using Biopython Bio.Restriction. Predict fragment sizes, get fragment sequences, simulate gel electrophoresis patterns, and perform double digests. Use when analyzing restriction digest fragment patterns.
development
Select restriction enzymes by criteria using Biopython Bio.Restriction. Find enzymes that cut once, don't cut, produce specific overhangs, are commercially available, or have compatible ends for cloning. Use when selecting restriction enzymes for cloning or analysis.