Version Compatibility

Reference examples tested with: requests 2.31+, cyvcf2 0.30+, pandas 2.2+, bcftools 1.19+, Entrez Direct 21.0+, lxml 5.0+ (for v2 XML schema).

Before using code patterns, verify installed versions match. If versions differ:

Python: pip show <package> then help(module.function) to check signatures
CLI: <tool> --version then <tool> --help to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying. ClinVar XML schema v2 (rolled out in 2024) replaces <ClinVarSet> with <VariationArchive> as the top-level anchor; XSLT or parsers targeting the legacy element silently emit zero records.

ClinVar Lookup and Clinical-Significance Triangulation

'Look up the clinical significance of this variant' -> Retrieve ClinVar VCV-level aggregate, SCV-level submissions, ClinGen Variant Curation Expert Panel (VCEP) overrides, and conflict-resolution status.

Python (REST): requests.get() against the E-utilities clinvar database
Python (local VCF): cyvcf2.VCF('clinvar.vcf.gz') for batch queries against the weekly snapshot
CLI: bcftools annotate -a clinvar.vcf.gz -c INFO/CLNSIG,INFO/CLNREVSTAT,INFO/CLNDN
Cross-database: ClinGen Allele Registry CA ID via https://reg.clinicalgenome.org/

The Identifier Hierarchy (VCV / SCV / RCV): Get This Wrong and Everything Downstream Breaks

| Level | Format | What it aggregates | When to use | Fails when | |-------|--------|--------------------|-------------|-----------| | SCV | SCVxxxxxxxxx.N | One submitter, one variant, one condition (atomic submission unit) | Auditing who said what; conflict triangulation | Aggregated reporting (use VCV); cross-condition analysis | | RCV | RCVxxxxxxxxx.N | All SCVs for a single (variant, condition) pair | Condition-stratified analysis; legacy aggregation | Variant-level reporting across all conditions (use VCV) | | VCV | VCVxxxxxxxxx.N | All RCVs for one variant across all conditions | Canonical anchor since 2017; default API entrypoint | Condition-specific clinical action (use RCV); CLNSIG collapses multi-condition |

Operational footgun: the clinvar.vcf.gz CLNSIG field is the variant-level (VCV) aggregate. A variant Pathogenic for disease A but VUS for disease B collapses to "Pathogenic/Conflicting". For condition-stratified analysis, parse RCV-level XML, never CLNSIG alone.

2024 XML schema overhaul: ClinVar v2 XML separates GermlineClassification, SomaticClinicalImpact, and OncogenicityClassification under one <VariationArchive> anchor. The legacy <ClinicalSignificance> element is gone. Pipelines built before September 2024 against <ClinVarSet> silently emit zero records on new XML. The dual-release period ended December 2024.

Star Ratings and the Override Hierarchy

| Stars | Review status | What it means operationally | |-------|--------------|---------------------------| | 4 | Practice guideline | ACMG/CAP CFTR-level (vanishingly rare) | | 3 | Expert panel reviewed (ClinGen VCEP) | FDA-recognized tier; overrides lower-star records for clinical action | | 2 | Multiple submitters, criteria provided, no conflicts | Reliable aggregate | | 1 | Single submitter OR conflicting interpretations (often mis-reported as 2-star) | Use with scrutiny | | 0 | No assertion criteria provided | Literature-only or legacy submissions |

ClinVar does NOT retract or hide lower-star records when a VCEP publishes; a variant can simultaneously display "Pathogenic (3-star VCEP)" and "Conflicting interpretations (1-star)". Tools handle this differently (VarSeq, Franklin, GenoOx each pick a winner via different rules); this is a major source of inter-tool disagreement.

ClinGen Variant Curation Expert Panels (VCEPs)

As of 2025, ~80-90 VCEPs are approved or in progress across RASopathies, hereditary cancer (ENIGMA BRCA1/2, InSiGHT MMR), cardiomyopathy (sarcomere genes), hearing loss, RPE65/IRD, inborn errors of metabolism, and FH. The current count is moving; the authoritative directory is the Criteria Specification Registry at https://cspec.genome.network/cspec/ui/svi/all.

Each VCEP publishes a gene-disease-specific CSpec that re-weights ACMG/AMP criteria. The Hearing Loss VCEP downgrades PM2 to supporting by default and upgrades PS3 thresholds for OTOF. Treating "ACMG/AMP" as a single rubric across all genes is the most common error in non-specialist tooling.

ACMG/AMP, ClinGen SVI Specifications, and the Bayesian Point System

The Richards 2015 28-criterion framework is the foundation, but every modern automated classifier (InterVar, GeneBe, Franklin, VarSome) implements the Tavtigian 2018/2020 Bayesian point system, not the original combining rules. Strengths map to points: Supporting=1, Moderate=2, Strong=4, Very Strong=8 (benign codes negative). Final categories: P >=10, LP 6-9, VUS 0-5, LB -1 to -6, B <=-7.

For variant interpretation framework details, calibrated in-silico thresholds, and PVS1 decision-tree logic, defer to clinical-databases/acmg-classification. This skill focuses on querying ClinVar; it intentionally does not re-implement classification.

Conflicting Interpretations and Conflict Resolution

Harrison 2017 Genet Med 19:1096 (PMID 28301460) showed 87% of inter-lab conflicts were resolvable by reassessment plus data sharing. As of 2024, only 3.8% of conflicting BRCA1 missense VUS reached consensus despite years of effort; conflict resolution is slow even in best-curated genes.

Submission staleness is non-trivial: ClinVar does not push reclassifications to submitters; a 2017 SCV can persist on an active label in 2026 if the lab has not re-submitted. Genome Alert! (Yauy 2022 Genet Med) was built specifically to detect classification drift between weekly releases. The median delta is ~1,247 classification changes per month with potential clinical impact.

Decision Tree by Query Scenario

| Scenario | Recommended path | Why | |----------|------------------|-----| | Single variant, known gene/condition | E-utilities esummary against clinvar DB | Lowest latency, returns VCV-level summary | | Batch (10-1000 variants) by HGVS or rsID | myvariant.info with fields=clinvar | Aggregated, includes ClinVar review status | | Batch (>1000) or coordinate-based | Local clinvar.vcf.gz with bcftools annotate or cyvcf2 | No rate limits; weekly snapshot | | Condition-stratified (variant in disease A vs B) | Bulk XML VariationArchive parsing | RCV is the only level that preserves per-condition classification | | Cross-database join with gnomAD / dbSNP / COSMIC | ClinGen Allele Registry CA ID | Build-agnostic, transcript-agnostic canonical identifier | | Reproducible analysis with citable date | First-Thursday-of-month archive on FTP | Only monthly snapshots are archived; weekly releases disappear |

REST API Query (E-utilities)

Goal: Retrieve VCV-level ClinVar summary for a single variant by ID, gene, or HGVS.

Approach: Hit esummary.fcgi or esearch.fcgi against db=clinvar, parse JSON, then optionally hydrate to full record with efetch.

import requests

EUTILS = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils'

def clinvar_summary(variation_id):
    '''Retrieve VCV-level summary by ClinVar VariationID (do not confuse with CA ID).

    The germline / somatic / oncogenicity classification nesting shown below
    follows the ClinVar 2024 eSummary v2 schema described in the data-access
    documentation. Field names have changed between API versions -- inspect
    the actual JSON returned by eSummary for the live ClinVar version before
    pinning these key paths in production.
    '''
    r = requests.get(f'{EUTILS}/esummary.fcgi',
                     params={'db': 'clinvar', 'id': variation_id, 'retmode': 'json'},
                     timeout=30)
    r.raise_for_status()
    record = r.json()['result'][str(variation_id)]
    return {
        'vcv': record.get('accession'),
        'name': record.get('title'),
        'germline_class': record.get('germline_classification', {}).get('description'),
        'germline_review_status': record.get('germline_classification', {}).get('review_status'),
        'somatic_clinical': record.get('clinical_impact_classification', {}).get('description'),
        'oncogenicity': record.get('oncogenicity_classification', {}).get('description'),
        'last_evaluated': record.get('germline_classification', {}).get('last_evaluated')
    }

def clinvar_search_gene(gene, pathogenic_only=False, retmax=500):
    term = f'{gene}[gene]'
    if pathogenic_only:
        term += ' AND (clinsig_pathogenic[Properties] OR clinsig_likely_pathogenic[Properties])'
    r = requests.get(f'{EUTILS}/esearch.fcgi',
                     params={'db': 'clinvar', 'term': term, 'retmax': retmax, 'retmode': 'json'},
                     timeout=30)
    return r.json()['esearchresult']['idlist']

Local VCF Query (Weekly Snapshot)

Goal: Annotate or look up thousands of variants without rate limits.

Approach: Download the weekly clinvar.vcf.gz (note: only first-Thursday-of-month is archived; for longitudinal stability pin to monthly archives), query by genomic coordinates with cyvcf2 or annotate VCFs with bcftools.

mkdir -p clinvar/$(date +%Y%m); cd clinvar/$(date +%Y%m)
wget https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz
wget https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz.tbi

from cyvcf2 import VCF

clinvar = VCF('clinvar.vcf.gz')

def lookup(chrom, pos, ref, alt):
    '''Look up by GRCh38 coords. Returns variant-level (VCV) aggregate; not RCV.'''
    for v in clinvar(f'{chrom}:{pos}-{pos}'):
        if v.REF == ref and alt in v.ALT:
            info = v.INFO
            return {
                'vcv_id': info.get('ALLELEID'),
                'clnsig': info.get('CLNSIG'),
                'clnsig_conf': info.get('CLNSIGCONF'),
                'clnrevstat': info.get('CLNREVSTAT'),
                'clndn': info.get('CLNDN'),
                'clnvc': info.get('CLNVC'),
                'clnhgvs': info.get('CLNHGVS'),
                'clndisdb': info.get('CLNDISDB'),
                'oncdn': info.get('ONCDN'),
                'scidn': info.get('SCIDN')
            }
    return None

bcftools annotate \
    -a clinvar.vcf.gz \
    -c INFO/CLNSIG,INFO/CLNREVSTAT,INFO/CLNDN,INFO/CLNVC,INFO/CLNHGVS,INFO/CLNSIGCONF \
    input.vcf.gz -O z -o annotated.vcf.gz
bcftools index -t annotated.vcf.gz

ClinGen Allele Registry (CA IDs): The Real Cross-Database Anchor

ClinGen Allele Registry (https://reg.clinicalgenome.org/) computes a build-agnostic, transcript-agnostic CA ID (format CA######) for any allele projectable onto NCBI references (GRCh37, GRCh38, T2T-CHM13, any RefSeq transcript). The Registry covers ~700M+ alleles, vastly more than ClinVar. CA ID and ClinVar VariationID are one-to-one when a variant exists in ClinVar.

def car_id(hgvs_g):
    '''Resolve HGVS-g to canonical ClinGen Allele Registry CA ID.'''
    r = requests.put(f'https://reg.clinicalgenome.org/allele',
                     headers={'Content-Type': 'text/plain'},
                     data=hgvs_g, timeout=30)
    return r.json().get('@id', '').split('/')[-1] if r.ok else None

Use CA ID for any join touching non-ClinVar resources (gnomAD, dbSNP, COSMIC, MAVEdb). VariationID was renumbered during the 2017 ClinVar schema redesign; treating it as a stable cross-build identifier is unsafe.

Per-Operation Failure Modes

1. Treating CLNSIG as gospel for condition-specific work

Trigger: Pull CLNSIG=Pathogenic from clinvar.vcf.gz and report variant as pathogenic for the patient's specific phenotype.
Mechanism: CLNSIG is VCV-level aggregate; a variant can be P for disease A and B for disease B.
Symptom: Patient phenotype does not match the disease where the variant is actually pathogenic; clinical action is wrong.
Fix: Parse RCV-level XML (<RCVAccession> per condition); cross-check CLNDN and report per-condition classifications.

2. Parsing legacy XML against 2024 schema

Trigger: XSLT or parser anchored on <ClinVarSet> or <ClinicalSignificance>.
Mechanism: 2024 schema replaces both anchors with <VariationArchive> + germline/somatic/oncogenicity tripartite classifications.
Symptom: Silent zero-record output, no error.
Fix: Re-target to <VariationArchive> and read GermlineClassification, SomaticClinicalImpact, OncogenicityClassification separately.

3. Counting variant_summary.txt rows naively

Trigger: wc -l variant_summary.txt to estimate variant count.
Mechanism: One row per assembly per variant (GRCh37 and GRCh38); double-counts.
Symptom: Counts inflated ~2x.
Fix: awk -F'\t' '$17=="GRCh38"' variant_summary.txt | wc -l.

4. Trusting VariationID as a stable cross-build identifier

Trigger: Join gnomAD-v4 records by ClinVar VariationID.
Mechanism: VariationIDs were renumbered for a subset of variants during the 2017 schema overhaul.
Symptom: Spurious mismatches at low rate (~1-2%).
Fix: Use ClinGen Allele Registry CA ID, which is computed deterministically from sequence.

5. Ignoring star-rating override hierarchy

Trigger: Pipeline picks the most-recent SCV regardless of review status.
Mechanism: A 1-star SCV submitted yesterday outranks a 3-star VCEP curation from 2022 by date.
Symptom: Clinical reports cite outdated or non-expert assertions over expert-panel decisions.
Fix: Sort by review_status rank (4>3>2>1>0); use the highest-star record. For ties, sort by date.

6. Aggregating "Conflicting" without inspecting the conflict

Trigger: Treat CLNSIG=Conflicting interpretations as VUS.
Mechanism: "Conflicting" can mean (P vs LP), (P vs VUS), or (P vs LB); the clinical meaning is completely different across these.
Symptom: Patients with high-evidence P variants reported as ambiguous; or true VUS reported as actionable.
Fix: Parse CLNSIGCONF to see exact conflict; weight by submitter star.

7. Missing somatic interpretations

Trigger: Pre-2024 pipeline reads only CLNSIG.
Mechanism: Somatic classifications live in new INFO fields (ONCDN, SCIDN, CLNSIGSOMATIC) since 2024.
Symptom: Cancer variants appear unclassified.
Fix: Read ONCDN (oncogenicity disease name), SCIDN (somatic clinical impact disease name), and the somatic-specific significance fields.

Reconciliation: When Sources Disagree

| Pattern | Likely cause | Action | |---------|-------------|--------| | ClinVar P vs gnomAD AF > 1% | Variant is true founder allele in unstratified gnomAD subset, OR ClinVar P is a stale low-star assertion | Check grpmax_faf95 excluding bottleneck groups; check ClinVar star rating | | ClinVar P vs AlphaMissense < 0.1 | Variant in NMD-escape region, alternative isoform, or ClinVar P is mis-curated | Check Pejaver 2022 calibration in acmg-classification skill; cross-check VCEP | | VCEP 3-star P vs commercial-lab 1-star B | VCEP supersedes for clinical action | Use VCEP; flag submitter for resubmission | | ClinVar VCV-level P vs RCV-level VUS for actual condition | VCV averages across conditions | Always report at RCV level for clinical action | | ClinVar P vs LOVD/HGMD discordant | LOVD/HGMD use different classification systems; HGMD "DM" != ACMG P | Triangulate against published evidence; do not auto-translate labels | | ClinVar P missing for a known disease variant | Submission lag (~6-12 months typical for new findings) | Check published literature; flag for ClinVar submission |

Quantitative Thresholds and Operational Conventions

| Threshold | Convention | Source | |-----------|-----------|--------| | Star >= 2 | Acceptable confidence for clinical action without further review | ClinGen SVI operational guidance | | Star = 3 | VCEP-curated; supersedes lower-star records | ClinGen FDA Recognition 2018 | | CLNSIG includes 'Pathogenic' OR 'Likely_pathogenic' | Treat as actionable for ACMG | ClinVar field schema | | CLNSIGCONF present | Multiple SCVs disagree; do NOT auto-action | ClinVar field schema | | Monthly archive | Use first-Thursday-of-month FTP snapshot for reproducible analyses | NCBI FTP retention policy | | Submission staleness | Re-check classification annually for active diagnostic variants | Yauy 2022 Genet Med (Genome Alert!) | | AF > 5% in gnomAD | BA1 standalone benign per ClinGen SVI default (VCEP overrides exist) | Richards 2015; SVI specs | | 1247 changes/month | Median variants with classification change per release | Yauy 2022 |

ClinVar Somatic vs Germline: 2024 Tripartite

The 2024 schema separates three orthogonal classifications, each with its own ReviewStatus and DateLastEvaluated:

GermlineClassification: Pathogenic / Likely Pathogenic / VUS / LB / B per ACMG/AMP 2015 + SVI.
SomaticClinicalImpact: Tier I / II / III / IV per AMP/ASCO/CAP 2017 (Li 2017 J Mol Diagn).
OncogenicityClassification: Oncogenic / Likely Oncogenic / VUS / Likely Benign / Benign per ClinGen/CGC/VICC 2022 oncogenicity framework.

A single VCV can carry all three with distinct evaluations; the legacy "Pathogenic" label is now ambiguous if not qualified by classification type.

Common Errors

| Symptom | Cause | Solution | |---------|-------|----------| | Empty result from efetch db=clinvar | rsID passed where VariationID expected | Use esearch first to resolve rsID to VariationID | | CLNSIG is _None or comma-separated mess | Variant has multi-condition RCVs; VCF collapses them | Parse RCV XML for per-condition values | | Variant present in ClinVar XML but absent from VCF | Variant lacks GRCh38 coordinates (legacy GRCh37-only submission) | Check <VariationArchive><SequenceLocation> per assembly | | 2024-format XML parser silently emits zero records | XML schema v2 incompatibility | Re-target to <VariationArchive> | | Conflicting interpretations with same star rating across two submitters | True scientific disagreement; sometimes resolved by VCEP later | Apply Tavtigian point system to manually reconcile; flag for VCEP review | | Variant has CA ID but no VariationID | Variant in Allele Registry but never submitted to ClinVar | Use AlleleRegistry as canonical; submit to ClinVar if novel pathogenic | | CLNSIG says Pathogenic but no associated condition CLNDN | Orphan classification (older submissions) | Treat as low confidence; cross-check publication | | Variants pulled by gene return only some isoforms | RefSeq transcript priority differences | Use MANE Select transcript explicitly; cross-check with VEP --mane_select |

Anticipated Reviewer Pushback

| Pushback | Standard response | |----------|-------------------| | "Why is this pathogenic variant 1-star?" | We report star rating per submission; clinical action requires star >=2 OR VCEP curation per ClinGen SVI 2018. | | "ClinVar says P but gnomAD AF = 2%" | Reconciled via Whiffin FAF95 max-credible-AF framework; bottleneck-group rule applied. | | "This VCV count differs from ClinVar.gov" | We pulled from the monthly archive (first-Thursday-of-month) for reproducibility; the live web is post-most-recent-weekly. | | "Why wasn't the somatic variant flagged?" | Pre-2024 XML schema had no separate somatic field; we now read ONCDN/SCIDN/SomaticClinicalImpact per v2 schema. | | "VarSome says LP but this says VUS" | Tool-specific aggregation rule differences; VarSome auto-applies PP3+PM2 by default per Tavtigian point system; we apply VCEP-specific PP3 calibration per CSpec. | | "rsID match returned wrong variant" | rsID is a cluster identifier; multi-allelic rsIDs require allele-level resolution; we use SPDI or CA ID. | | "Why retest a 2022-curated variant?" | Classifications drift as evidence accrues; ClinGen recommends annual re-review for active diagnostic variants. |

References

Landrum MJ et al. 2025. ClinVar: updates to support classifications of both germline and somatic variants. Nucleic Acids Res 53(D1):D1313.
Harrison SM et al. 2017. Clinical laboratories collaborate to resolve differences in variant interpretations submitted to ClinVar. Genet Med 19:1096.
Yauy K et al. 2022. Genome Alert! a standardized procedure for genomic variant reinterpretation and automated gene-phenotype reassessment. Genet Med 24:1316.
Tavtigian SV et al. 2018. Modeling the ACMG/AMP variant classification guidelines as a Bayesian classification framework. Genet Med 20:1054.
Tavtigian SV et al. 2020. Fitting a naturally scaled point system to the ACMG/AMP variant classification guidelines. Hum Mutat 41:1734.
Richards S et al. 2015. Standards and guidelines for the interpretation of sequence variants. Genet Med 17:405. (Original ACMG/AMP 2015)
Pejaver V et al. 2022. Calibration of computational tools for missense variant pathogenicity classification. Am J Hum Genet 109:2163.
Abou Tayoun AN et al. 2018. Recommendations for interpreting the loss of function PVS1 ACMG/AMP variant criterion. Hum Mutat 39:1517.
Brnich SE et al. 2020. Recommendations for application of the functional evidence PS3/BS3 criterion using the ACMG/AMP sequence variant interpretation framework. Genome Med 12:3.
Li MM et al. 2017. Standards and guidelines for the interpretation and reporting of sequence variants in cancer. J Mol Diagn 19:4. (AMP/ASCO/CAP somatic framework)
ClinGen Allele Registry: https://reg.clinicalgenome.org/docs/cg-car/
CSpec Registry: https://cspec.genome.network/cspec/ui/svi/all

Related Skills

clinical-databases/acmg-classification - ACMG/AMP framework, PVS1 decision tree, Pejaver calibrated PP3/BP4 thresholds, Bayesian point system
clinical-databases/myvariant-queries - Aggregated queries including ClinVar overlay
clinical-databases/variant-prioritization - Rare-disease filtering pipeline using ClinVar
clinical-databases/gnomad-frequencies - Population frequency for BS1/BA1 cross-check
variant-calling/clinical-interpretation - Clinical reporting workflow

Version Compatibility

Reference examples tested with: requests 2.31+, cyvcf2 0.30+, pandas 2.2+, bcftools 1.19+, Entrez Direct 21.0+, lxml 5.0+ (for v2 XML schema).

Before using code patterns, verify installed versions match. If versions differ:

Python: pip show <package> then help(module.function) to check signatures
CLI: <tool> --version then <tool> --help to confirm flags

ClinVar Lookup and Clinical-Significance Triangulation

Python (REST): requests.get() against the E-utilities clinvar database
Python (local VCF): cyvcf2.VCF('clinvar.vcf.gz') for batch queries against the weekly snapshot
CLI: bcftools annotate -a clinvar.vcf.gz -c INFO/CLNSIG,INFO/CLNREVSTAT,INFO/CLNDN
Cross-database: ClinGen Allele Registry CA ID via https://reg.clinicalgenome.org/

The Identifier Hierarchy (VCV / SCV / RCV): Get This Wrong and Everything Downstream Breaks

Star Ratings and the Override Hierarchy

ClinGen Variant Curation Expert Panels (VCEPs)

ACMG/AMP, ClinGen SVI Specifications, and the Bayesian Point System

Conflicting Interpretations and Conflict Resolution

Decision Tree by Query Scenario

REST API Query (E-utilities)

Goal: Retrieve VCV-level ClinVar summary for a single variant by ID, gene, or HGVS.

Approach: Hit esummary.fcgi or esearch.fcgi against db=clinvar, parse JSON, then optionally hydrate to full record with efetch.

import requests

EUTILS = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils'

def clinvar_summary(variation_id):
    '''Retrieve VCV-level summary by ClinVar VariationID (do not confuse with CA ID).

    The germline / somatic / oncogenicity classification nesting shown below
    follows the ClinVar 2024 eSummary v2 schema described in the data-access
    documentation. Field names have changed between API versions -- inspect
    the actual JSON returned by eSummary for the live ClinVar version before
    pinning these key paths in production.
    '''
    r = requests.get(f'{EUTILS}/esummary.fcgi',
                     params={'db': 'clinvar', 'id': variation_id, 'retmode': 'json'},
                     timeout=30)
    r.raise_for_status()
    record = r.json()['result'][str(variation_id)]
    return {
        'vcv': record.get('accession'),
        'name': record.get('title'),
        'germline_class': record.get('germline_classification', {}).get('description'),
        'germline_review_status': record.get('germline_classification', {}).get('review_status'),
        'somatic_clinical': record.get('clinical_impact_classification', {}).get('description'),
        'oncogenicity': record.get('oncogenicity_classification', {}).get('description'),
        'last_evaluated': record.get('germline_classification', {}).get('last_evaluated')
    }

def clinvar_search_gene(gene, pathogenic_only=False, retmax=500):
    term = f'{gene}[gene]'
    if pathogenic_only:
        term += ' AND (clinsig_pathogenic[Properties] OR clinsig_likely_pathogenic[Properties])'
    r = requests.get(f'{EUTILS}/esearch.fcgi',
                     params={'db': 'clinvar', 'term': term, 'retmax': retmax, 'retmode': 'json'},
                     timeout=30)
    return r.json()['esearchresult']['idlist']

Local VCF Query (Weekly Snapshot)

Goal: Annotate or look up thousands of variants without rate limits.

mkdir -p clinvar/$(date +%Y%m); cd clinvar/$(date +%Y%m)
wget https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz
wget https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz.tbi

from cyvcf2 import VCF

clinvar = VCF('clinvar.vcf.gz')

def lookup(chrom, pos, ref, alt):
    '''Look up by GRCh38 coords. Returns variant-level (VCV) aggregate; not RCV.'''
    for v in clinvar(f'{chrom}:{pos}-{pos}'):
        if v.REF == ref and alt in v.ALT:
            info = v.INFO
            return {
                'vcv_id': info.get('ALLELEID'),
                'clnsig': info.get('CLNSIG'),
                'clnsig_conf': info.get('CLNSIGCONF'),
                'clnrevstat': info.get('CLNREVSTAT'),
                'clndn': info.get('CLNDN'),
                'clnvc': info.get('CLNVC'),
                'clnhgvs': info.get('CLNHGVS'),
                'clndisdb': info.get('CLNDISDB'),
                'oncdn': info.get('ONCDN'),
                'scidn': info.get('SCIDN')
            }
    return None

bcftools annotate \
    -a clinvar.vcf.gz \
    -c INFO/CLNSIG,INFO/CLNREVSTAT,INFO/CLNDN,INFO/CLNVC,INFO/CLNHGVS,INFO/CLNSIGCONF \
    input.vcf.gz -O z -o annotated.vcf.gz
bcftools index -t annotated.vcf.gz

ClinGen Allele Registry (CA IDs): The Real Cross-Database Anchor

def car_id(hgvs_g):
    '''Resolve HGVS-g to canonical ClinGen Allele Registry CA ID.'''
    r = requests.put(f'https://reg.clinicalgenome.org/allele',
                     headers={'Content-Type': 'text/plain'},
                     data=hgvs_g, timeout=30)
    return r.json().get('@id', '').split('/')[-1] if r.ok else None

Per-Operation Failure Modes

1. Treating CLNSIG as gospel for condition-specific work

Trigger: Pull CLNSIG=Pathogenic from clinvar.vcf.gz and report variant as pathogenic for the patient's specific phenotype.
Mechanism: CLNSIG is VCV-level aggregate; a variant can be P for disease A and B for disease B.
Symptom: Patient phenotype does not match the disease where the variant is actually pathogenic; clinical action is wrong.
Fix: Parse RCV-level XML (<RCVAccession> per condition); cross-check CLNDN and report per-condition classifications.

2. Parsing legacy XML against 2024 schema

Trigger: XSLT or parser anchored on <ClinVarSet> or <ClinicalSignificance>.
Mechanism: 2024 schema replaces both anchors with <VariationArchive> + germline/somatic/oncogenicity tripartite classifications.
Symptom: Silent zero-record output, no error.
Fix: Re-target to <VariationArchive> and read GermlineClassification, SomaticClinicalImpact, OncogenicityClassification separately.

3. Counting variant_summary.txt rows naively

Trigger: wc -l variant_summary.txt to estimate variant count.
Mechanism: One row per assembly per variant (GRCh37 and GRCh38); double-counts.
Symptom: Counts inflated ~2x.
Fix: awk -F'\t' '$17=="GRCh38"' variant_summary.txt | wc -l.

4. Trusting VariationID as a stable cross-build identifier

Trigger: Join gnomAD-v4 records by ClinVar VariationID.
Mechanism: VariationIDs were renumbered for a subset of variants during the 2017 schema overhaul.
Symptom: Spurious mismatches at low rate (~1-2%).
Fix: Use ClinGen Allele Registry CA ID, which is computed deterministically from sequence.

5. Ignoring star-rating override hierarchy

Trigger: Pipeline picks the most-recent SCV regardless of review status.
Mechanism: A 1-star SCV submitted yesterday outranks a 3-star VCEP curation from 2022 by date.
Symptom: Clinical reports cite outdated or non-expert assertions over expert-panel decisions.
Fix: Sort by review_status rank (4>3>2>1>0); use the highest-star record. For ties, sort by date.

6. Aggregating "Conflicting" without inspecting the conflict

Trigger: Treat CLNSIG=Conflicting interpretations as VUS.
Mechanism: "Conflicting" can mean (P vs LP), (P vs VUS), or (P vs LB); the clinical meaning is completely different across these.
Symptom: Patients with high-evidence P variants reported as ambiguous; or true VUS reported as actionable.
Fix: Parse CLNSIGCONF to see exact conflict; weight by submitter star.

7. Missing somatic interpretations

Trigger: Pre-2024 pipeline reads only CLNSIG.
Mechanism: Somatic classifications live in new INFO fields (ONCDN, SCIDN, CLNSIGSOMATIC) since 2024.
Symptom: Cancer variants appear unclassified.
Fix: Read ONCDN (oncogenicity disease name), SCIDN (somatic clinical impact disease name), and the somatic-specific significance fields.

Reconciliation: When Sources Disagree

Quantitative Thresholds and Operational Conventions

ClinVar Somatic vs Germline: 2024 Tripartite

The 2024 schema separates three orthogonal classifications, each with its own ReviewStatus and DateLastEvaluated:

GermlineClassification: Pathogenic / Likely Pathogenic / VUS / LB / B per ACMG/AMP 2015 + SVI.
SomaticClinicalImpact: Tier I / II / III / IV per AMP/ASCO/CAP 2017 (Li 2017 J Mol Diagn).
OncogenicityClassification: Oncogenic / Likely Oncogenic / VUS / Likely Benign / Benign per ClinGen/CGC/VICC 2022 oncogenicity framework.

A single VCV can carry all three with distinct evaluations; the legacy "Pathogenic" label is now ambiguous if not qualified by classification type.

Common Errors

Anticipated Reviewer Pushback

References

Landrum MJ et al. 2025. ClinVar: updates to support classifications of both germline and somatic variants. Nucleic Acids Res 53(D1):D1313.
Harrison SM et al. 2017. Clinical laboratories collaborate to resolve differences in variant interpretations submitted to ClinVar. Genet Med 19:1096.
Yauy K et al. 2022. Genome Alert! a standardized procedure for genomic variant reinterpretation and automated gene-phenotype reassessment. Genet Med 24:1316.
Tavtigian SV et al. 2018. Modeling the ACMG/AMP variant classification guidelines as a Bayesian classification framework. Genet Med 20:1054.
Tavtigian SV et al. 2020. Fitting a naturally scaled point system to the ACMG/AMP variant classification guidelines. Hum Mutat 41:1734.
Richards S et al. 2015. Standards and guidelines for the interpretation of sequence variants. Genet Med 17:405. (Original ACMG/AMP 2015)
Pejaver V et al. 2022. Calibration of computational tools for missense variant pathogenicity classification. Am J Hum Genet 109:2163.
Abou Tayoun AN et al. 2018. Recommendations for interpreting the loss of function PVS1 ACMG/AMP variant criterion. Hum Mutat 39:1517.
Brnich SE et al. 2020. Recommendations for application of the functional evidence PS3/BS3 criterion using the ACMG/AMP sequence variant interpretation framework. Genome Med 12:3.
Li MM et al. 2017. Standards and guidelines for the interpretation and reporting of sequence variants in cancer. J Mol Diagn 19:4. (AMP/ASCO/CAP somatic framework)
ClinGen Allele Registry: https://reg.clinicalgenome.org/docs/cg-car/
CSpec Registry: https://cspec.genome.network/cspec/ui/svi/all

Related Skills

clinical-databases/acmg-classification - ACMG/AMP framework, PVS1 decision tree, Pejaver calibrated PP3/BP4 thresholds, Bayesian point system
clinical-databases/myvariant-queries - Aggregated queries including ClinVar overlay
clinical-databases/variant-prioritization - Rare-disease filtering pipeline using ClinVar
clinical-databases/gnomad-frequencies - Population frequency for BS1/BA1 cross-check
variant-calling/clinical-interpretation - Clinical reporting workflow

Adoption

GPTomics/bio-clinical-databases-clinvar-lookup

$ install --global

Security Scan Results

SKILL.md

Version Compatibility

ClinVar Lookup and Clinical-Significance Triangulation

The Identifier Hierarchy (VCV / SCV / RCV): Get This Wrong and Everything Downstream Breaks

Star Ratings and the Override Hierarchy

ClinGen Variant Curation Expert Panels (VCEPs)

ACMG/AMP, ClinGen SVI Specifications, and the Bayesian Point System

Conflicting Interpretations and Conflict Resolution

Decision Tree by Query Scenario

REST API Query (E-utilities)

Local VCF Query (Weekly Snapshot)

ClinGen Allele Registry (CA IDs): The Real Cross-Database Anchor

Per-Operation Failure Modes

Reconciliation: When Sources Disagree

Quantitative Thresholds and Operational Conventions

ClinVar Somatic vs Germline: 2024 Tripartite

Common Errors

Anticipated Reviewer Pushback

References

Related Skills

Related Skills

GPTomics/bio-workflows-clip-pipeline

GPTomics/bio-comparative-genomics-whole-genome-duplication

GPTomics/bio-comparative-genomics-whole-genome-alignment

GPTomics/bio-comparative-genomics-synteny-analysis

GPTomics/bio-clinical-databases-clinvar-lookup

$ install --global

Security Scan Results

SKILL.md

Version Compatibility

ClinVar Lookup and Clinical-Significance Triangulation

The Identifier Hierarchy (VCV / SCV / RCV): Get This Wrong and Everything Downstream Breaks

Star Ratings and the Override Hierarchy

ClinGen Variant Curation Expert Panels (VCEPs)

ACMG/AMP, ClinGen SVI Specifications, and the Bayesian Point System

Conflicting Interpretations and Conflict Resolution

Decision Tree by Query Scenario

REST API Query (E-utilities)

Local VCF Query (Weekly Snapshot)

ClinGen Allele Registry (CA IDs): The Real Cross-Database Anchor

Per-Operation Failure Modes

Reconciliation: When Sources Disagree

Quantitative Thresholds and Operational Conventions

ClinVar Somatic vs Germline: 2024 Tripartite

Common Errors

Anticipated Reviewer Pushback

References

Related Skills

Related Skills

GPTomics/bio-workflows-clip-pipeline

GPTomics/bio-comparative-genomics-whole-genome-duplication

GPTomics/bio-comparative-genomics-whole-genome-alignment

GPTomics/bio-comparative-genomics-synteny-analysis