Version Compatibility

Reference examples tested with: BioPython 1.83+, Entrez Direct 21.0+

Before using code patterns, verify installed versions match. If versions differ:

Python: pip show biopython then help(Bio.Entrez.elink) to check signatures
CLI: elink -version then elink -help to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

Entrez Link

"Find records linked to this record in another NCBI database" -> ELink walks the curated, weekly-maintained link tables between Entrez databases. A link is an asserted relationship (e.g. "this PubMed article describes this nucleotide sequence"), not a similarity hit.

ELink is the navigation layer of Entrez. The decision that matters most is which linkname to use — not which databases. A single (dbfrom, db) pair can have a dozen linkname variants distinguishing curation level, evidence type, and direction. Picking the wrong one is the difference between 5 high-confidence matches and 500 noisy automated assertions.

Python: Entrez.elink(dbfrom=..., db=..., id=..., linkname=...) (BioPython)
CLI: elink -db pubmed -target gene -name pubmed_gene_rif (Entrez Direct)
R: entrez_link(dbfrom=..., db=..., id=...) (rentrez)

Required Setup

from Bio import Entrez
Entrez.email = '[email protected]'
Entrez.api_key = 'optional_api_key'  # raises rate to 10 req/sec

The `linkname` decision (most important)

For most (dbfrom, db) pairs NCBI exposes multiple link tables. The qualifiers in the name encode the curation level and the evidence source. Choose deliberately.

gene -> protein (representative example)

| linkname | Returns | When to use | |---|---|---| | gene_protein | All linked proteins (curated + automated) | Exploration; expect 10-1000x more hits | | gene_protein_refseq | RefSeq proteins only | Reference-quality analyses; orthology | | gene_protein_swissprot | Reviewed UniProt entries with NCBI cross-ref | Functional annotation; literature support |

pubmed -> gene

| linkname | Returns | |---|---| | pubmed_gene | Genes mentioned in this paper (text-mined + curated) | | pubmed_gene_rif | Genes with a Reference Into Function (curated, high-quality) | | pubmed_gene_pubmed | Other PubMed records sharing gene linkage (rare use) |

nucleotide -> protein

| linkname | Returns | |---|---| | nuccore_protein | All proteins encoded by this nucleotide record (CDS-linked) | | nuccore_protein_refseq | RefSeq proteins only |

Discover what link names exist for a pair

h = Entrez.elink(dbfrom='gene', db='protein', id='672', cmd='acheck')
record = Entrez.read(h); h.close()
for ls in record[0]['IdCheckList']['IdLinkSet'][0]['LinkInfo']:
    print(f'{ls["Name"]}  -> {ls["DbTo"]} | {ls["MenuTag"]} ({ls["HtmlTag"]})')

cmd='acheck' is the only authoritative way to enumerate available linknames — they change with each NCBI release.

Decision table: which `cmd` for which goal

| Goal | cmd | Returns | |---|---|---| | Get linked records | neighbor (default) | Linked IDs in target db | | Get linked + relevance scores | neighbor_score | IDs with similarity scores (mostly pubmed_pubmed) | | Get >200 source IDs in one go | neighbor_history | WebEnv + QueryKey for downstream EFetch | | Enumerate available links | acheck | List of all linknames for source IDs | | Check if any link exists | ncheck | Boolean per source ID | | Check specific link exists | lcheck | Boolean per source ID + linkname | | Get NCBI HTML link URLs | llinks | URLs to Entrez record pages | | Get external provider links | prlinks | URLs to journal sites, etc. |

The neighbor_history cmd is essential when source id count exceeds ~200 — past that, the URL-length limit makes the comma-joined form fail. With neighbor_history ELink puts results on the history server and returns WebEnv/QueryKey for downstream pickup.

Asymmetric link warning

ELink relationships are not guaranteed symmetric. pubmed_gene and gene_pubmed may return different sets because:

Direction-dependent curation: gene-to-PubMed is curated by NCBI staff (GeneRIF); PubMed-to-gene includes text-mining.
Cutoffs: some link tables truncate at N best links in one direction but not the other.
Index lag asymmetry: when one db updates faster than the other.

If round-trip consistency matters (e.g. "every gene mentioned in this paper, then every paper mentioning each gene"), expect the round-trip set to be larger than the input — and never assume A -> B -> A returns the original ID alone.

Per-database link catalog (curated subset)

gene

| Target | Common linknames | Notes | |---|---|---| | protein | gene_protein, gene_protein_refseq, gene_protein_swissprot | RefSeq is the safe default | | nuccore | gene_nuccore, gene_nuccore_refseqrna, gene_nuccore_refseqgene | refseqrna for mRNA, refseqgene for the curated gene region | | pubmed | gene_pubmed, gene_pubmed_rif | RIF is curated and high-quality | | homologene | gene_homologene | Deprecated 2014 but data still queryable | | snp | gene_snp | dbSNP entries in gene region | | clinvar | gene_clinvar | Clinical variants | | omim | gene_omim | Disease associations |

nuccore / nucleotide

| Target | Common linknames | |---|---| | protein | nuccore_protein, nuccore_protein_refseq | | gene | nuccore_gene | | taxonomy | nuccore_taxonomy | | biosample | nuccore_biosample | | sra | nuccore_sra | | pubmed | nuccore_pubmed, nuccore_pubmed_refseq |

protein

| Target | Common linknames | |---|---| | nuccore | protein_nuccore, protein_nuccore_cds, protein_nuccore_mrna | | gene | protein_gene | | structure | protein_structure | | cdd | protein_cdd (conserved domains) | | pubmed | protein_pubmed |

pubmed

| Target | Common linknames | |---|---| | pubmed | pubmed_pubmed, pubmed_pubmed_citedin, pubmed_pubmed_refs | | gene | pubmed_gene, pubmed_gene_rif | | protein | pubmed_protein | | nuccore | pubmed_nuccore | | gds | pubmed_gds (GEO datasets cited in paper) | | sra | pubmed_sra |

bioproject

| Target | Common linknames | |---|---| | biosample | bioproject_biosample | | sra | bioproject_sra | | pubmed | bioproject_pubmed |

Code patterns

Single source -> single target

Goal: Get RefSeq proteins for a single gene.

Approach: ELink with explicit linkname to restrict to curated set.

Reference (BioPython 1.83+):

def gene_to_refseq_proteins(gene_id):
    h = Entrez.elink(dbfrom='gene', db='protein', id=gene_id, linkname='gene_protein_refseq')
    r = Entrez.read(h); h.close()
    if not r[0]['LinkSetDb']:
        return []
    return [link['Id'] for link in r[0]['LinkSetDb'][0]['Link']]

print(gene_to_refseq_proteins('672'))  # BRCA1

Batch source -> target (small batch)

Goal: Get linked proteins for a list of <200 gene IDs in one call.

Approach: Comma-join IDs; one linkset per input in the response.

Reference (BioPython 1.83+):

def batch_gene_protein(gene_ids):
    h = Entrez.elink(dbfrom='gene', db='protein', id=','.join(gene_ids), linkname='gene_protein_refseq')
    r = Entrez.read(h); h.close()
    out = {}
    for linkset in r:
        src = linkset['IdList'][0]
        out[src] = [link['Id'] for link in linkset['LinkSetDb'][0]['Link']] if linkset['LinkSetDb'] else []
    return out

Large batch via history server

Goal: Link 5,000 gene IDs to proteins without hitting URL-length limits.

Approach: EPost the IDs first (chunked at 200), then ELink with cmd='neighbor_history' referencing the WebEnv. Downstream EFetch picks up linked IDs from the history server.

Reference (BioPython 1.83+):

def post_then_link(gene_ids, target='protein', linkname='gene_protein_refseq'):
    # EPost in chunks of 200
    webenv = None
    for i in range(0, len(gene_ids), 200):
        chunk = gene_ids[i:i+200]
        kwargs = {'db': 'gene', 'id': ','.join(chunk)}
        if webenv:
            kwargs['WebEnv'] = webenv
        h = Entrez.epost(**kwargs)
        r = Entrez.read(h); h.close()
        webenv = r['WebEnv']
        query_key = r['QueryKey']
        time.sleep(0.1 if Entrez.api_key else 0.34)

    # Link with neighbor_history
    h = Entrez.elink(dbfrom='gene', db=target, linkname=linkname,
                     cmd='neighbor_history', WebEnv=webenv, query_key=query_key)
    r = Entrez.read(h); h.close()
    # WebEnv is at the top level of the response; QueryKey is per-LinkSetDbHistory entry.
    return r[0]['WebEnv'], r[0]['LinkSetDbHistory'][0]['QueryKey']

we, qk = post_then_link(['672', '675', '7157'] * 1000)
# Downstream: Entrez.efetch(db='protein', WebEnv=we, query_key=qk, retstart=..., retmax=500)

Discover all available links

Goal: Before writing a pipeline, enumerate what link tables NCBI exposes for a (dbfrom, source-id) pair.

Approach: cmd='acheck' returns the full LinkInfo list per source.

Reference (BioPython 1.83+):

def list_link_names(dbfrom, id):
    h = Entrez.elink(dbfrom=dbfrom, id=id, cmd='acheck')
    r = Entrez.read(h); h.close()
    info = r[0]['IdCheckList']['IdLinkSet'][0]['LinkInfo']
    return [(i['Name'], i['DbTo'], i['MenuTag']) for i in info]

for name, target, label in list_link_names('gene', '672'):
    print(f'{name:<40} -> {target:<15} ({label})')

Chain links (gene -> protein -> structure)

def gene_to_structures(gene_id):
    h = Entrez.elink(dbfrom='gene', db='protein', id=gene_id, linkname='gene_protein_refseq')
    r = Entrez.read(h); h.close()
    if not r[0]['LinkSetDb']:
        return []
    prot_ids = [l['Id'] for l in r[0]['LinkSetDb'][0]['Link'][:10]]
    time.sleep(0.1 if Entrez.api_key else 0.34)
    h = Entrez.elink(dbfrom='protein', db='structure', id=','.join(prot_ids))
    r = Entrez.read(h); h.close()
    out = []
    for ls in r:
        if ls['LinkSetDb']:
            out.extend(l['Id'] for l in ls['LinkSetDb'][0]['Link'])
    return out

Get neighbor_score for related PubMed articles

def related_pubmed(pmid, top=10):
    h = Entrez.elink(dbfrom='pubmed', db='pubmed', id=pmid,
                     linkname='pubmed_pubmed', cmd='neighbor_score')
    r = Entrez.read(h); h.close()
    if not r[0]['LinkSetDb']:
        return []
    return [(l['Id'], int(l['Score'])) for l in r[0]['LinkSetDb'][0]['Link'][:top]]

BioProject -> SRA runs

For SRA discovery, pysradb.SRAweb().sra_metadata(prjna, detailed=True) (see sra-data) is the higher-fidelity path — returns SRR accessions directly with run-level metadata in one call. Use ELink only when staying inside Bio.Entrez:

def bioproject_to_sra(prjna):
    # Convert PRJNA to UID first
    h = Entrez.esearch(db='bioproject', term=f'{prjna}[BioProject]')
    r = Entrez.read(h); h.close()
    if not r['IdList']:
        return []
    bp_uid = r['IdList'][0]
    time.sleep(0.1 if Entrez.api_key else 0.34)
    # Link to SRA
    h = Entrez.elink(dbfrom='bioproject', db='sra', id=bp_uid)
    r = Entrez.read(h); h.close()
    return [l['Id'] for l in r[0]['LinkSetDb'][0]['Link']] if r[0]['LinkSetDb'] else []

Failure modes

Wrong linkname gives wrong order of magnitude

Trigger: Using gene_protein when gene_protein_refseq was intended.
Mechanism: gene_protein includes all automated and predicted entries (XP_* RefSeq plus all GenBank submissions).
Symptom: 500 proteins returned per gene instead of the expected 1-5 canonical isoforms.
Fix: Pick the curated linkname; verify counts on a known gene.

Empty LinkSetDb on valid input

Trigger: Gene with no linked records in the requested target.
Mechanism: record[0]['LinkSetDb'] is an empty list, not raising an error.
Symptom: KeyError if code assumes record[0]['LinkSetDb'][0] always exists.
Fix: Always guard if not record[0]['LinkSetDb']: return [].

Asymmetric round-trip

Trigger: Pipeline does genes_for_paper(pmid) -> papers_for_each_gene -> set of PMIDs.
Mechanism: pubmed_gene (text-mined + curated) is larger than gene_pubmed (curated only); the round-trip set is not closed.
Symptom: Original PMID may not appear in the round-trip set; new PMIDs do.
Fix: Document the directional asymmetry; use the more-curated linkname (*_rif variants) when fidelity matters.

URL length limit on large batches

Trigger: Comma-joined id= with 200+ IDs.
Mechanism: HTTP GET URL exceeds NCBI's parsing limit (~2000 chars).
Symptom: HTTP 414 URI Too Long, or silent truncation.
Fix: EPost the IDs first, then ELink with cmd='neighbor_history'.

One linkset per input ID, indexing confusion

Trigger: Sending 5 IDs, then accessing record[0]['LinkSetDb'][0]['Link'] expecting the union.
Mechanism: ELink returns one LinkSet per input UID, indexed by position.
Symptom: Only the first input's links are processed; rest are dropped.
Fix: Iterate for linkset in record: and map by linkset['IdList'][0].

Mismatched dbfrom and id namespace

Trigger: Passing a PMID into dbfrom='nucleotide'.
Mechanism: ELink returns no error — it just looks up the PMID as a nucleotide UID, finds nothing.
Symptom: Empty LinkSetDb on a "valid" ID.
Fix: Validate that the ID matches the source db namespace (PMIDs are db=pubmed, GeneIDs are db=gene).

Common errors

| Error / symptom | Cause | Solution | |---|---|---| | KeyError: 'LinkSetDb' | Empty result not guarded | if not record[0]['LinkSetDb']: return [] | | HTTPError 414 | Comma-joined id too long | Use EPost + neighbor_history | | HTTPError 400 | Invalid linkname or wrong db namespace | Use cmd='acheck' to enumerate valid links | | 500 hits instead of 5 | Wrong linkname (e.g. gene_protein vs _refseq) | Pick curated variant | | Round-trip set differs from input | Asymmetric link tables | Document; use curated variants |

References

Sayers EW et al. (2024) Database resources of the National Center for Biotechnology Information in 2024. Nucleic Acids Res 52:D33-D43.
Kans J. (2024) Entrez Direct: E-utilities on the Unix Command Line. NCBI Bookshelf NBK179288.
NCBI. ELink help. NBK25499.

Related Skills

entrez-search - Resolve UIDs before linking
entrez-fetch - Retrieve linked records' content
batch-downloads - History-server retrieval after ELink with neighbor_history
geo-data - Specialized gds <-> pubmed/bioproject links (gds->sra ELink unreliable; use pysradb)
ncbi-datasets-cli - Modern alternative for gene/genome cross-reference queries

Version Compatibility

Reference examples tested with: BioPython 1.83+, Entrez Direct 21.0+

Before using code patterns, verify installed versions match. If versions differ:

Python: pip show biopython then help(Bio.Entrez.elink) to check signatures
CLI: elink -version then elink -help to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

Entrez Link

Python: Entrez.elink(dbfrom=..., db=..., id=..., linkname=...) (BioPython)
CLI: elink -db pubmed -target gene -name pubmed_gene_rif (Entrez Direct)
R: entrez_link(dbfrom=..., db=..., id=...) (rentrez)

Required Setup

from Bio import Entrez
Entrez.email = '[email protected]'
Entrez.api_key = 'optional_api_key'  # raises rate to 10 req/sec

The `linkname` decision (most important)

For most (dbfrom, db) pairs NCBI exposes multiple link tables. The qualifiers in the name encode the curation level and the evidence source. Choose deliberately.

gene -> protein (representative example)

pubmed -> gene

nucleotide -> protein

| linkname | Returns | |---|---| | nuccore_protein | All proteins encoded by this nucleotide record (CDS-linked) | | nuccore_protein_refseq | RefSeq proteins only |

Discover what link names exist for a pair

h = Entrez.elink(dbfrom='gene', db='protein', id='672', cmd='acheck')
record = Entrez.read(h); h.close()
for ls in record[0]['IdCheckList']['IdLinkSet'][0]['LinkInfo']:
    print(f'{ls["Name"]}  -> {ls["DbTo"]} | {ls["MenuTag"]} ({ls["HtmlTag"]})')

cmd='acheck' is the only authoritative way to enumerate available linknames — they change with each NCBI release.

Decision table: which `cmd` for which goal

Asymmetric link warning

ELink relationships are not guaranteed symmetric. pubmed_gene and gene_pubmed may return different sets because:

Direction-dependent curation: gene-to-PubMed is curated by NCBI staff (GeneRIF); PubMed-to-gene includes text-mining.
Cutoffs: some link tables truncate at N best links in one direction but not the other.
Index lag asymmetry: when one db updates faster than the other.

Per-database link catalog (curated subset)

gene

nuccore / nucleotide

protein

pubmed

bioproject

| Target | Common linknames | |---|---| | biosample | bioproject_biosample | | sra | bioproject_sra | | pubmed | bioproject_pubmed |

Code patterns

Single source -> single target

Goal: Get RefSeq proteins for a single gene.

Approach: ELink with explicit linkname to restrict to curated set.

Reference (BioPython 1.83+):

def gene_to_refseq_proteins(gene_id):
    h = Entrez.elink(dbfrom='gene', db='protein', id=gene_id, linkname='gene_protein_refseq')
    r = Entrez.read(h); h.close()
    if not r[0]['LinkSetDb']:
        return []
    return [link['Id'] for link in r[0]['LinkSetDb'][0]['Link']]

print(gene_to_refseq_proteins('672'))  # BRCA1

Batch source -> target (small batch)

Goal: Get linked proteins for a list of <200 gene IDs in one call.

Approach: Comma-join IDs; one linkset per input in the response.

Reference (BioPython 1.83+):

def batch_gene_protein(gene_ids):
    h = Entrez.elink(dbfrom='gene', db='protein', id=','.join(gene_ids), linkname='gene_protein_refseq')
    r = Entrez.read(h); h.close()
    out = {}
    for linkset in r:
        src = linkset['IdList'][0]
        out[src] = [link['Id'] for link in linkset['LinkSetDb'][0]['Link']] if linkset['LinkSetDb'] else []
    return out

Large batch via history server

Goal: Link 5,000 gene IDs to proteins without hitting URL-length limits.

Approach: EPost the IDs first (chunked at 200), then ELink with cmd='neighbor_history' referencing the WebEnv. Downstream EFetch picks up linked IDs from the history server.

Reference (BioPython 1.83+):

def post_then_link(gene_ids, target='protein', linkname='gene_protein_refseq'):
    # EPost in chunks of 200
    webenv = None
    for i in range(0, len(gene_ids), 200):
        chunk = gene_ids[i:i+200]
        kwargs = {'db': 'gene', 'id': ','.join(chunk)}
        if webenv:
            kwargs['WebEnv'] = webenv
        h = Entrez.epost(**kwargs)
        r = Entrez.read(h); h.close()
        webenv = r['WebEnv']
        query_key = r['QueryKey']
        time.sleep(0.1 if Entrez.api_key else 0.34)

    # Link with neighbor_history
    h = Entrez.elink(dbfrom='gene', db=target, linkname=linkname,
                     cmd='neighbor_history', WebEnv=webenv, query_key=query_key)
    r = Entrez.read(h); h.close()
    # WebEnv is at the top level of the response; QueryKey is per-LinkSetDbHistory entry.
    return r[0]['WebEnv'], r[0]['LinkSetDbHistory'][0]['QueryKey']

we, qk = post_then_link(['672', '675', '7157'] * 1000)
# Downstream: Entrez.efetch(db='protein', WebEnv=we, query_key=qk, retstart=..., retmax=500)

Discover all available links

Goal: Before writing a pipeline, enumerate what link tables NCBI exposes for a (dbfrom, source-id) pair.

Approach: cmd='acheck' returns the full LinkInfo list per source.

Reference (BioPython 1.83+):

def list_link_names(dbfrom, id):
    h = Entrez.elink(dbfrom=dbfrom, id=id, cmd='acheck')
    r = Entrez.read(h); h.close()
    info = r[0]['IdCheckList']['IdLinkSet'][0]['LinkInfo']
    return [(i['Name'], i['DbTo'], i['MenuTag']) for i in info]

for name, target, label in list_link_names('gene', '672'):
    print(f'{name:<40} -> {target:<15} ({label})')

Chain links (gene -> protein -> structure)

def gene_to_structures(gene_id):
    h = Entrez.elink(dbfrom='gene', db='protein', id=gene_id, linkname='gene_protein_refseq')
    r = Entrez.read(h); h.close()
    if not r[0]['LinkSetDb']:
        return []
    prot_ids = [l['Id'] for l in r[0]['LinkSetDb'][0]['Link'][:10]]
    time.sleep(0.1 if Entrez.api_key else 0.34)
    h = Entrez.elink(dbfrom='protein', db='structure', id=','.join(prot_ids))
    r = Entrez.read(h); h.close()
    out = []
    for ls in r:
        if ls['LinkSetDb']:
            out.extend(l['Id'] for l in ls['LinkSetDb'][0]['Link'])
    return out

Get neighbor_score for related PubMed articles

def related_pubmed(pmid, top=10):
    h = Entrez.elink(dbfrom='pubmed', db='pubmed', id=pmid,
                     linkname='pubmed_pubmed', cmd='neighbor_score')
    r = Entrez.read(h); h.close()
    if not r[0]['LinkSetDb']:
        return []
    return [(l['Id'], int(l['Score'])) for l in r[0]['LinkSetDb'][0]['Link'][:top]]

BioProject -> SRA runs

def bioproject_to_sra(prjna):
    # Convert PRJNA to UID first
    h = Entrez.esearch(db='bioproject', term=f'{prjna}[BioProject]')
    r = Entrez.read(h); h.close()
    if not r['IdList']:
        return []
    bp_uid = r['IdList'][0]
    time.sleep(0.1 if Entrez.api_key else 0.34)
    # Link to SRA
    h = Entrez.elink(dbfrom='bioproject', db='sra', id=bp_uid)
    r = Entrez.read(h); h.close()
    return [l['Id'] for l in r[0]['LinkSetDb'][0]['Link']] if r[0]['LinkSetDb'] else []

Failure modes

Wrong linkname gives wrong order of magnitude

Trigger: Using gene_protein when gene_protein_refseq was intended.
Mechanism: gene_protein includes all automated and predicted entries (XP_* RefSeq plus all GenBank submissions).
Symptom: 500 proteins returned per gene instead of the expected 1-5 canonical isoforms.
Fix: Pick the curated linkname; verify counts on a known gene.

Empty LinkSetDb on valid input

Trigger: Gene with no linked records in the requested target.
Mechanism: record[0]['LinkSetDb'] is an empty list, not raising an error.
Symptom: KeyError if code assumes record[0]['LinkSetDb'][0] always exists.
Fix: Always guard if not record[0]['LinkSetDb']: return [].

Asymmetric round-trip

Trigger: Pipeline does genes_for_paper(pmid) -> papers_for_each_gene -> set of PMIDs.
Mechanism: pubmed_gene (text-mined + curated) is larger than gene_pubmed (curated only); the round-trip set is not closed.
Symptom: Original PMID may not appear in the round-trip set; new PMIDs do.
Fix: Document the directional asymmetry; use the more-curated linkname (*_rif variants) when fidelity matters.

URL length limit on large batches

Trigger: Comma-joined id= with 200+ IDs.
Mechanism: HTTP GET URL exceeds NCBI's parsing limit (~2000 chars).
Symptom: HTTP 414 URI Too Long, or silent truncation.
Fix: EPost the IDs first, then ELink with cmd='neighbor_history'.

One linkset per input ID, indexing confusion

Trigger: Sending 5 IDs, then accessing record[0]['LinkSetDb'][0]['Link'] expecting the union.
Mechanism: ELink returns one LinkSet per input UID, indexed by position.
Symptom: Only the first input's links are processed; rest are dropped.
Fix: Iterate for linkset in record: and map by linkset['IdList'][0].

Mismatched dbfrom and id namespace

Trigger: Passing a PMID into dbfrom='nucleotide'.
Mechanism: ELink returns no error — it just looks up the PMID as a nucleotide UID, finds nothing.
Symptom: Empty LinkSetDb on a "valid" ID.
Fix: Validate that the ID matches the source db namespace (PMIDs are db=pubmed, GeneIDs are db=gene).

Common errors

References

Sayers EW et al. (2024) Database resources of the National Center for Biotechnology Information in 2024. Nucleic Acids Res 52:D33-D43.
Kans J. (2024) Entrez Direct: E-utilities on the Unix Command Line. NCBI Bookshelf NBK179288.
NCBI. ELink help. NBK25499.

Related Skills

entrez-search - Resolve UIDs before linking
entrez-fetch - Retrieve linked records' content
batch-downloads - History-server retrieval after ELink with neighbor_history
geo-data - Specialized gds <-> pubmed/bioproject links (gds->sra ELink unreliable; use pysradb)
ncbi-datasets-cli - Modern alternative for gene/genome cross-reference queries

Adoption

GPTomics/bio-entrez-link

$ install --global

Security Scan Results

SKILL.md

Version Compatibility

Entrez Link

Required Setup

The linkname decision (most important)

gene -> protein (representative example)

pubmed -> gene

nucleotide -> protein

Discover what link names exist for a pair

Decision table: which cmd for which goal

Asymmetric link warning

Per-database link catalog (curated subset)

gene

nuccore / nucleotide

protein

pubmed

bioproject

Code patterns

Single source -> single target

Batch source -> target (small batch)

Large batch via history server

Discover all available links

Chain links (gene -> protein -> structure)

Get neighbor_score for related PubMed articles

BioProject -> SRA runs

Failure modes

Wrong linkname gives wrong order of magnitude

Empty LinkSetDb on valid input

Asymmetric round-trip

URL length limit on large batches

One linkset per input ID, indexing confusion

Mismatched dbfrom and id namespace

Common errors

References

Related Skills

Related Skills

GPTomics/bio-workflows-clip-pipeline

GPTomics/bio-comparative-genomics-whole-genome-duplication

GPTomics/bio-comparative-genomics-whole-genome-alignment

GPTomics/bio-comparative-genomics-synteny-analysis

GPTomics/bio-entrez-link

$ install --global

Security Scan Results

SKILL.md

Version Compatibility

Entrez Link

Required Setup

The linkname decision (most important)

gene -> protein (representative example)

pubmed -> gene

nucleotide -> protein

Discover what link names exist for a pair

Decision table: which cmd for which goal

Asymmetric link warning

Per-database link catalog (curated subset)

gene

nuccore / nucleotide

protein

pubmed

bioproject

Code patterns

Single source -> single target

Batch source -> target (small batch)

Large batch via history server

Discover all available links

Chain links (gene -> protein -> structure)

Get neighbor_score for related PubMed articles

BioProject -> SRA runs

Failure modes

Wrong linkname gives wrong order of magnitude

Empty LinkSetDb on valid input

Asymmetric round-trip

URL length limit on large batches

One linkset per input ID, indexing confusion

Mismatched dbfrom and id namespace

Common errors

The `linkname` decision (most important)

Decision table: which `cmd` for which goal

The `linkname` decision (most important)

Decision table: which `cmd` for which goal