Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

aipoch/ena-database

Name: ena-database
Author: aipoch

scientific-skills/Evidence Insights/ena-database/SKILL.md

npx skillsauth add aipoch/medical-research-skills ena-database

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Clean

VirusTotalMulti-engine malware detection

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

When to Use

Use this skill when you need to:

Download raw sequencing reads (FASTQ) for a run/experiment/study using ENA accessions (e.g., ERR..., SRR..., PRJ...).
Find samples, runs, experiments, or assemblies by metadata filters (organism, platform, collection date, geography, etc.).
Retrieve record metadata (XML/JSON/TSV) for reproducible reporting and pipeline inputs.
Query taxonomic lineage/rank for organisms to drive filtering or grouping in analyses.
Perform bulk discovery + bulk download workflows (search first, then fetch many files via FTP/Aspera/tools).

Key Features

Multi-object ENA coverage: studies/projects, samples, experiments, runs, assemblies, sequences, analyses, taxonomy records.
Two primary API styles:
- Portal API for advanced search and metadata export (JSON/TSV/CSV).
- Browser API for direct record retrieval by accession (XML).
Multiple data formats: FASTQ, FASTA, BAM/CRAM, EMBL flat file, plus metadata in XML/JSON/TSV.
Bulk transfer options: FTP/Aspera and command-line tooling patterns for large datasets.
Cross-references and reference retrieval: ENA xref service and CRAM reference registry endpoints.
Operational guidance: rate limiting awareness (HTTP 429) and best practices for robust pipelines.

For detailed endpoint and parameter documentation, see references/api_reference.md.

Dependencies

Python >=3.9
requests >=2.31.0

Optional (recommended for XML parsing when using the Browser API):

lxml >=4.9.0

Example Usage

The following script is a complete, runnable example that:

searches ENA for runs in a study via the Portal API (JSON), then
fetches one run’s record via the Browser API (XML), and
retrieves taxonomy lineage via the Taxonomy REST API.

#!/usr/bin/env python3
import sys
import time
import requests

PORTAL_SEARCH = "https://www.ebi.ac.uk/ena/portal/api/search"
BROWSER_XML = "https://www.ebi.ac.uk/ena/browser/api/xml"
TAXONOMY = "https://www.ebi.ac.uk/ena/taxonomy/rest"

SESSION = requests.Session()
SESSION.headers.update({"User-Agent": "ena-database-skill/1.0"})

def get_with_backoff(url, params=None, max_retries=6, timeout=30):
    delay = 1.0
    for attempt in range(max_retries):
        r = SESSION.get(url, params=params, timeout=timeout)
        if r.status_code != 429:
            r.raise_for_status()
            return r
        time.sleep(delay)
        delay *= 2
    r.raise_for_status()

def search_runs_by_study(study_accession, limit=5):
    params = {
        "result": "read_run",
        "query": f"study_accession={study_accession}",
        "format": "json",
        "limit": limit,
        # Ask for a few useful fields; adjust as needed for your pipeline.
        "fields": "run_accession,study_accession,sample_accession,experiment_accession,tax_id,scientific_name,fastq_ftp"
    }
    r = get_with_backoff(PORTAL_SEARCH, params=params)
    return r.json()

def fetch_run_xml(run_accession):
    url = f"{BROWSER_XML}/{run_accession}"
    r = get_with_backoff(url)
    return r.text  # XML string

def fetch_taxonomy_lineage(tax_id):
    url = f"{TAXONOMY}/tax-id/{tax_id}"
    r = get_with_backoff(url)
    return r.json()

def main():
    if len(sys.argv) < 2:
        print("Usage: python ena_example.py <STUDY_ACCESSION>  (e.g., PRJEB1234)", file=sys.stderr)
        sys.exit(2)

    study = sys.argv[1]
    runs = search_runs_by_study(study_accession=study, limit=5)

    if not runs:
        print(f"No runs found for study {study}")
        return

    print(f"Found {len(runs)} runs for study {study}")
    first = runs[0]
    run_acc = first.get("run_accession")
    tax_id = first.get("tax_id")

    print("\nFirst run summary (Portal API JSON):")
    for k in ["run_accession", "sample_accession", "experiment_accession", "scientific_name", "tax_id", "fastq_ftp"]:
        print(f"  {k}: {first.get(k)}")

    if run_acc:
        xml = fetch_run_xml(run_acc)
        print("\nBrowser API XML (first 600 chars):")
        print(xml[:600])

    if tax_id:
        tax = fetch_taxonomy_lineage(tax_id)
        print("\nTaxonomy lineage (ENA Taxonomy REST API):")
        # Response is typically a list with one record
        rec = tax[0] if isinstance(tax, list) and tax else tax
        print(f"  scientificName: {rec.get('scientificName')}")
        print(f"  rank: {rec.get('rank')}")
        print(f"  lineage: {rec.get('lineage')}")

if __name__ == "__main__":
    main()

Run:

python ena_example.py PRJEB1234

Implementation Details

ENA data model (what you query and retrieve)

ENA organizes records into common object types used in pipelines:

Study/Project: umbrella entity for a dataset; primary unit for citation.
Sample: biological material metadata.
Experiment: library prep + instrument metadata.
Run: the actual sequencing output files (often FASTQ) for one run.
Assembly: genome/transcriptome/metagenome assemblies.
Sequence/Record: annotated sequences (e.g., EMBL records).
Analysis: computational results derived from sequence data.
Taxonomy: lineage and rank information.

API selection guidance

Portal API (/ena/portal/api/search): use for searching and exporting metadata at scale.
- Typical outputs: json, tsv, csv.
- Supports complex query expressions (see references/api_reference.md).
Browser API (/ena/browser/api/xml/{accession}): use for direct retrieval by accession.
- Output: XML (parse with an XML parser, not regex).
Taxonomy REST API (/ena/taxonomy/rest/...): use for lineage/rank lookups.
Cross-reference service: https://www.ebi.ac.uk/ena/xref/rest/ for related records in external databases.
CRAM reference registry: https://www.ebi.ac.uk/ena/cram/ for reference sequence retrieval by checksum.

Query parameters and outputs (practical notes)

Portal API core parameters (commonly used):
- result: record type (e.g., sample, read_run, assembly)
- query: filter expression (e.g., study_accession=PRJEB1234, tax_tree(Escherichia coli))
- fields: comma-separated fields to return (improves performance vs returning everything)
- format: json/tsv/csv
- limit (and pagination where applicable)
File retrieval:
- For raw reads, prefer extracting file locations (e.g., fastq_ftp) from Portal results, then download via FTP/Aspera for scale.

Rate limiting and robustness

ENA APIs are rate-limited (commonly documented as 50 requests/second). Exceeding limits returns HTTP 429.
Implement:
- exponential backoff on 429,
- request consolidation (fetch multiple fields in one query),
- bulk download mechanisms for large datasets instead of per-accession loops.

Recommended pipeline pattern (search → resolve → download)

Search with Portal API to obtain accessions and file URLs.
Resolve any needed details (optional) via Browser API XML for specific accessions.
Download large files via FTP/Aspera or tooling (rather than API streaming).
Cache taxonomy lookups when processing many records to reduce repeated calls.

aipoch/ena-database

scientific-skills/Evidence Insights/ena-database/SKILL.md

Access the European Nucleotide Archive (ENA) via REST APIs and FTP/Aspera to search and retrieve sequences, raw reads (FASTQ), assemblies, and metadata when you have accession IDs or need metadata-driven discovery for genomics pipelines.

37 stars

development

Updated Mar 26, 2026

$ install --global

skillsauth

npx skillsauth add aipoch/medical-research-skills ena-database

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

4 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Clean

VirusTotalMulti-engine malware detection

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 5, 2026, 12:31 AM63.7s1 file scanned

SKILL.md

name:: ena-database
description:: Access the European Nucleotide Archive (ENA) via REST APIs and FTP/Aspera to search and retrieve sequences, raw reads (FASTQ), assemblies, and metadata when you have accession IDs or need metadata-driven discovery for genomics pipelines.
license:: MIT
skill-author:: AIPOCH

When to Use

Use this skill when you need to:

Download raw sequencing reads (FASTQ) for a run/experiment/study using ENA accessions (e.g., ERR..., SRR..., PRJ...).
Find samples, runs, experiments, or assemblies by metadata filters (organism, platform, collection date, geography, etc.).
Retrieve record metadata (XML/JSON/TSV) for reproducible reporting and pipeline inputs.
Query taxonomic lineage/rank for organisms to drive filtering or grouping in analyses.
Perform bulk discovery + bulk download workflows (search first, then fetch many files via FTP/Aspera/tools).

Key Features

Multi-object ENA coverage: studies/projects, samples, experiments, runs, assemblies, sequences, analyses, taxonomy records.
Two primary API styles:
- Portal API for advanced search and metadata export (JSON/TSV/CSV).
- Browser API for direct record retrieval by accession (XML).
Multiple data formats: FASTQ, FASTA, BAM/CRAM, EMBL flat file, plus metadata in XML/JSON/TSV.
Bulk transfer options: FTP/Aspera and command-line tooling patterns for large datasets.
Cross-references and reference retrieval: ENA xref service and CRAM reference registry endpoints.
Operational guidance: rate limiting awareness (HTTP 429) and best practices for robust pipelines.

For detailed endpoint and parameter documentation, see references/api_reference.md.

Dependencies

Python >=3.9
requests >=2.31.0

Optional (recommended for XML parsing when using the Browser API):

lxml >=4.9.0

Example Usage

The following script is a complete, runnable example that:

searches ENA for runs in a study via the Portal API (JSON), then
fetches one run’s record via the Browser API (XML), and
retrieves taxonomy lineage via the Taxonomy REST API.

#!/usr/bin/env python3
import sys
import time
import requests

PORTAL_SEARCH = "https://www.ebi.ac.uk/ena/portal/api/search"
BROWSER_XML = "https://www.ebi.ac.uk/ena/browser/api/xml"
TAXONOMY = "https://www.ebi.ac.uk/ena/taxonomy/rest"

SESSION = requests.Session()
SESSION.headers.update({"User-Agent": "ena-database-skill/1.0"})

def get_with_backoff(url, params=None, max_retries=6, timeout=30):
    delay = 1.0
    for attempt in range(max_retries):
        r = SESSION.get(url, params=params, timeout=timeout)
        if r.status_code != 429:
            r.raise_for_status()
            return r
        time.sleep(delay)
        delay *= 2
    r.raise_for_status()

def search_runs_by_study(study_accession, limit=5):
    params = {
        "result": "read_run",
        "query": f"study_accession={study_accession}",
        "format": "json",
        "limit": limit,
        # Ask for a few useful fields; adjust as needed for your pipeline.
        "fields": "run_accession,study_accession,sample_accession,experiment_accession,tax_id,scientific_name,fastq_ftp"
    }
    r = get_with_backoff(PORTAL_SEARCH, params=params)
    return r.json()

def fetch_run_xml(run_accession):
    url = f"{BROWSER_XML}/{run_accession}"
    r = get_with_backoff(url)
    return r.text  # XML string

def fetch_taxonomy_lineage(tax_id):
    url = f"{TAXONOMY}/tax-id/{tax_id}"
    r = get_with_backoff(url)
    return r.json()

def main():
    if len(sys.argv) < 2:
        print("Usage: python ena_example.py <STUDY_ACCESSION>  (e.g., PRJEB1234)", file=sys.stderr)
        sys.exit(2)

    study = sys.argv[1]
    runs = search_runs_by_study(study_accession=study, limit=5)

    if not runs:
        print(f"No runs found for study {study}")
        return

    print(f"Found {len(runs)} runs for study {study}")
    first = runs[0]
    run_acc = first.get("run_accession")
    tax_id = first.get("tax_id")

    print("\nFirst run summary (Portal API JSON):")
    for k in ["run_accession", "sample_accession", "experiment_accession", "scientific_name", "tax_id", "fastq_ftp"]:
        print(f"  {k}: {first.get(k)}")

    if run_acc:
        xml = fetch_run_xml(run_acc)
        print("\nBrowser API XML (first 600 chars):")
        print(xml[:600])

    if tax_id:
        tax = fetch_taxonomy_lineage(tax_id)
        print("\nTaxonomy lineage (ENA Taxonomy REST API):")
        # Response is typically a list with one record
        rec = tax[0] if isinstance(tax, list) and tax else tax
        print(f"  scientificName: {rec.get('scientificName')}")
        print(f"  rank: {rec.get('rank')}")
        print(f"  lineage: {rec.get('lineage')}")

if __name__ == "__main__":
    main()

Run:

python ena_example.py PRJEB1234

Implementation Details

ENA data model (what you query and retrieve)

ENA organizes records into common object types used in pipelines:

Study/Project: umbrella entity for a dataset; primary unit for citation.
Sample: biological material metadata.
Experiment: library prep + instrument metadata.
Run: the actual sequencing output files (often FASTQ) for one run.
Assembly: genome/transcriptome/metagenome assemblies.
Sequence/Record: annotated sequences (e.g., EMBL records).
Analysis: computational results derived from sequence data.
Taxonomy: lineage and rank information.

API selection guidance

Portal API (/ena/portal/api/search): use for searching and exporting metadata at scale.
- Typical outputs: json, tsv, csv.
- Supports complex query expressions (see references/api_reference.md).
Browser API (/ena/browser/api/xml/{accession}): use for direct retrieval by accession.
- Output: XML (parse with an XML parser, not regex).
Taxonomy REST API (/ena/taxonomy/rest/...): use for lineage/rank lookups.
Cross-reference service: https://www.ebi.ac.uk/ena/xref/rest/ for related records in external databases.
CRAM reference registry: https://www.ebi.ac.uk/ena/cram/ for reference sequence retrieval by checksum.

Query parameters and outputs (practical notes)

Portal API core parameters (commonly used):
- result: record type (e.g., sample, read_run, assembly)
- query: filter expression (e.g., study_accession=PRJEB1234, tax_tree(Escherichia coli))
- fields: comma-separated fields to return (improves performance vs returning everything)
- format: json/tsv/csv
- limit (and pagination where applicable)
File retrieval:
- For raw reads, prefer extracting file locations (e.g., fastq_ftp) from Portal results, then download via FTP/Aspera for scale.

Rate limiting and robustness

ENA APIs are rate-limited (commonly documented as 50 requests/second). Exceeding limits returns HTTP 429.
Implement:
- exponential backoff on 429,
- request consolidation (fetch multiple fields in one query),
- bulk download mechanisms for large datasets instead of per-accession loops.

Recommended pipeline pattern (search → resolve → download)

Search with Portal API to obtain accessions and file URLs.
Resolve any needed details (optional) via Browser API XML for specific accessions.
Download large files via FTP/Aspera or tooling (rather than API streaming).
Cache taxonomy lookups when processing many records to reduce repeated calls.

Related Skills

aipoch/conventional-oncology-hub-gene

tools

VerifiedTrustedCommunity

Generates complete conventional oncology bulk-transcriptome biomarker and hub-gene research designs from a user-provided cancer type and study direction. Always use this skill whenever a user wants to design, plan, or build a tumor bioinformatics study centered on differential expression, prognostic filtering or risk modeling, PPI-based hub-gene prioritization, diagnostic/prognostic evaluation, clinical association, immune infiltration context, methylation context, and optional tissue or cell validation. Covers five study patterns (signature-first prognostic workflow, hub-gene-first biomarker workflow, hybrid signature-to-hub workflow, immune-context biomarker workflow, translational validation workflow) and always outputs four workload configs (Lite / Standard / Advanced / Publication+) with recommended primary plan, step-by-step workflow, figure plan, validation strategy, minimal executable version, publication upgrade path...

348SKILL.mdUpdated Apr 28, 2026

aipoch/conventional-oncology-hub-gene

aipoch/conventional-non-oncology-hub-gene

development

VerifiedTrustedCommunity

Generates complete conventional non-oncology bioinformatics research designs from a user-provided disease context, process-related gene family or biological theme, and validation direction. Use when a study centers on multi-dataset bulk transcriptome integration, DEG analysis, process-gene intersection, enrichment analysis, GSEA, PPI hub-gene prioritization, TF/miRNA regulatory networks, ROC-based biomarker evaluation, and immune infiltration analysis. Covers five study patterns (process-DEG discovery, enrichment/GSEA interpretation, hub-gene prioritization, regulatory-network and immune interpretation, multi-layer public validation) and always outputs Lite / Standard / Advanced / Publication+ with a recommended primary plan, stepwise workflow, figure plan, validation hierarchy, minimal executable version, publication upgrade path, and strictly verified literature retrieval.

348SKILL.mdUpdated Apr 28, 2026

aipoch/conventional-non-oncology-hub-gene

aipoch/confounder-and-bias-control-planner

tools

VerifiedTrustedCommunity

Plans confounder control, variable adjustment logic, and bias mitigation strategies at the protocol stage for clinical, epidemiologic, translational, observational, and biomarker studies. Always use this skill when a user needs to identify major confounders, decide which variables should or should not be adjusted for, compare matching/stratification/weighting approaches, anticipate selection or measurement bias, or pressure-test a study design before execution. Focus on bias sensing, causal structure awareness, variable-role classification, and critical design review rather than generic statistical advice.

348SKILL.mdUpdated Apr 28, 2026

aipoch/confounder-and-bias-control-planner

aipoch/comparative-network-toxicology-shared-mechanism-reference-grounded

testing

VerifiedTrustedCommunity

Generates complete comparative network-toxicology research designs from a user-provided exposure pair, shared toxic phenotype, and validation direction. Use when a study centers on two related exposures under one outcome and needs target collection, shared-vs-specific target decomposition, enrichment, PPI hub prioritization, docking, optional transcriptomic cross-checks, and conservative mechanistic synthesis. Covers five study patterns and always outputs Lite / Standard / Advanced / Publication+ with a recommended primary plan, stepwise workflow, figure plan, validation hierarchy, minimal executable version, publication upgrade path, and strictly verified literature retrieval.

348SKILL.mdUpdated Apr 28, 2026

aipoch/comparative-network-toxicology-shared-mechanism-reference-grounded

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/aipoch/medical-research-skills.git

# Copy into Claude Code skills folder (global)
cp -r medical-research-skills/scientific-skills/Evidence Insights/ena-database ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

aipoch/medical-research-skills

37 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT