name: tooluniverse-structural-variant-analysis description: Comprehensive structural variant (SV) analysis skill for clinical genomics. Classifies SVs (deletions, duplications, inversions, translocations), assesses pathogenicity using ACMG-adapted criteria, evaluates gene disruption and dosage sensitivity, and provides clinical interpretation with evidence grading. Use when analyzing CNVs, large deletions/duplications, chromosomal rearrangements, or any structural variants requiring clinical interpretation.

Structural Variant Analysis Workflow

Systematic analysis of structural variants (deletions, duplications, inversions, translocations, complex rearrangements) for clinical genomics interpretation using ACMG-adapted criteria.

KEY PRINCIPLES:

Report-first approach - Create SV_analysis_report.md FIRST, then populate progressively
ACMG-style classification - Pathogenic/Likely Pathogenic/VUS/Likely Benign/Benign with explicit evidence
Evidence grading - Grade all findings by confidence level (★★★/★★☆/★☆☆)
Dosage sensitivity critical - Gene dosage effects drive SV pathogenicity
Breakpoint precision matters - Exact gene disruption vs dosage-only effects
Population context essential - gnomAD SVs for frequency assessment
English-first queries - Always use English terms in tool calls (gene names, disease names), even if the user writes in another language. Only try original-language terms as a fallback. Respond in the user's language

Problem This Skill Solves

Structural variants (SVs) present unique interpretation challenges:

Complex molecular consequences - SVs can cause gene dosage changes, gene disruption, gene fusions, position effects
Size matters - Pathogenicity depends on size, gene content, and breakpoint precision
Limited databases - Fewer curated SVs in ClinVar compared to SNVs
Dosage sensitivity - Haploinsufficiency and triplosensitivity are critical but gene-specific
Population frequency - Large benign CNVs are common; distinguishing pathogenic from benign is challenging

This skill provides: A systematic workflow integrating SV classification, gene content analysis, dosage sensitivity assessment, population frequencies, and ACMG-adapted criteria into clinically actionable interpretations.

Triggers

Use this skill when users:

Ask about structural variant interpretation
Have CNV data from array or sequencing
Ask "is this deletion/duplication pathogenic?"
Need ACMG classification for SVs
Want to assess gene dosage effects
Ask about chromosomal rearrangements
Have large-scale genomic alterations requiring interpretation

Workflow Overview

┌─────────────────────────────────────────────────────────────────┐
│              STRUCTURAL VARIANT INTERPRETATION                   │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Phase 1: SV IDENTITY & CLASSIFICATION                          │
│  ├── Normalize SV coordinates (hg19/hg38)                       │
│  ├── Determine SV type (DEL/DUP/INV/TRA/CPX)                   │
│  ├── Calculate SV size                                          │
│  └── Assess breakpoint precision                                │
│                                                                  │
│  Phase 2: GENE CONTENT ANALYSIS                                  │
│  ├── Identify genes fully contained in SV                       │
│  ├── Identify genes with breakpoints (disrupted)                │
│  ├── Annotate gene function and disease associations            │
│  ├── Identify regulatory elements affected                      │
│  └── Assess gene orientation (for inversions/translocations)    │
│                                                                  │
│  Phase 3: DOSAGE SENSITIVITY ASSESSMENT                          │
│  ├── ClinGen dosage sensitivity scores                          │
│  │   └─ Haploinsufficiency / Triplosensitivity ratings          │
│  ├── DECIPHER haploinsufficiency predictions                    │
│  ├── pLI scores (gnomAD) for loss-of-function intolerance       │
│  ├── OMIM gene-disease associations (dominant/recessive)        │
│  └── Known dosage-sensitive genes from literature               │
│                                                                  │
│  Phase 4: POPULATION FREQUENCY CONTEXT                           │
│  ├── gnomAD SV database (overlapping SVs)                       │
│  ├── DGV (Database of Genomic Variants)                         │
│  ├── ClinVar (known pathogenic/benign SVs)                      │
│  └── Calculate reciprocal overlap with population SVs           │
│                                                                  │
│  Phase 5: PATHOGENICITY SCORING                                  │
│  ├── Pathogenicity score (0-10 scale)                           │
│  │   ├─ Gene content weight (40%)                               │
│  │   ├─ Dosage sensitivity weight (30%)                         │
│  │   ├─ Population frequency weight (20%)                       │
│  │   └─ Inheritance/phenotype match weight (10%)                │
│  ├── Apply ACMG SV criteria                                     │
│  └── Generate classification recommendation                      │
│                                                                  │
│  Phase 6: LITERATURE & CLINICAL EVIDENCE                         │
│  ├── PubMed: Similar SVs, gene disruption studies               │
│  ├── DECIPHER: Developmental disorder cases                     │
│  ├── Clinical case reports                                      │
│  └── Functional evidence for gene dosage effects                │
│                                                                  │
│  Phase 7: ACMG-ADAPTED CLASSIFICATION                            │
│  ├── Apply SV-specific evidence codes                           │
│  ├── Calculate final classification                             │
│  ├── Identify limiting factors                                  │
│  └── Generate clinical recommendations                          │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Phase Details

Phase 1: SV Identity & Classification

Goal: Standardize SV notation and classify type

SV Types: | Type | Abbreviation | Description | Molecular Effect | |------|--------------|-------------|------------------| | Deletion | DEL | Loss of genomic segment | Haploinsufficiency, gene disruption | | Duplication | DUP | Gain of genomic segment | Triplosensitivity, gene dosage imbalance | | Inversion | INV | Segment flipped in orientation | Gene disruption at breakpoints, position effects | | Translocation | TRA | Segment moved to different chromosome | Gene fusions, disruption, position effects | | Complex | CPX | Multiple rearrangement types | Variable effects |

Key Information to Capture:

Chromosome(s) involved
Coordinates (start, end) in hg19/hg38
SV size (bp or Mb)
SV type (DEL/DUP/INV/TRA/CPX)
Breakpoint precision (±50bp, ±1kb, etc.)
Inheritance pattern (de novo, inherited, unknown)

Example:

SV: arr[GRCh38] 17q21.31(44039927-44352659)x1
- Type: Deletion (heterozygous)
- Size: 313 kb
- Genes: MAPT, KANSL1 (fully contained)
- Breakpoints: Well-defined (array resolution ±5kb)

Phase 2: Gene Content Analysis

Goal: Comprehensive annotation of genes affected by SV

Tools: | Tool | Purpose | Key Data | |------|---------|----------| | Ensembl_lookup_gene | Gene structure, coordinates | Gene boundaries, exons, transcripts | | NCBI_gene_search | Gene information | Official symbol, aliases, description | | Gene_Ontology_get_term_info | Gene function | Biological process, molecular function | | OMIM_search, OMIM_get_entry | Disease associations | Inheritance, clinical features | | DisGeNET_search_gene | Gene-disease associations | Evidence scores |

Gene Categories:

Fully contained genes - Entire gene within SV boundaries
- Deletion: Complete loss of one copy (haploinsufficiency)
- Duplication: Extra copy (triplosensitivity)
Partially disrupted genes - Breakpoint within gene
- Likely loss-of-function for affected allele
- Check if critical domains disrupted
Flanking genes - Within 1 Mb of breakpoints
- May be affected by position effects
- Regulatory disruption possible

Example Gene Content Analysis:

def analyze_gene_content(tu, chrom, sv_start, sv_end, sv_type):
    """
    Identify and annotate all genes within SV region.
    """
    genes = {
        'fully_contained': [],
        'partially_disrupted': [],
        'flanking': []
    }

    # Use Ensembl to find overlapping genes
    # This is pseudocode - actual implementation depends on available tools

    for gene in genes_in_region:
        gene_start = gene['start']
        gene_end = gene['end']

        # Classify gene relationship to SV
        if gene_start >= sv_start and gene_end <= sv_end:
            # Fully contained
            gene_info = annotate_gene(tu, gene['symbol'])
            genes['fully_contained'].append(gene_info)

        elif (gene_start < sv_start < gene_end) or (gene_start < sv_end < gene_end):
            # Partially disrupted
            gene_info = annotate_gene(tu, gene['symbol'])
            genes['partially_disrupted'].append(gene_info)

        elif abs(gene_start - sv_end) < 1000000 or abs(gene_end - sv_start) < 1000000:
            # Flanking (within 1 Mb)
            gene_info = annotate_gene(tu, gene['symbol'])
            genes['flanking'].append(gene_info)

    return genes

def annotate_gene(tu, gene_symbol):
    """
    Comprehensive gene annotation.
    """
    # OMIM associations
    omim = tu.tools.OMIM_search(
        operation="search",
        query=gene_symbol,
        limit=5
    )

    # DisGeNET associations
    disgenet = tu.tools.DisGeNET_search_gene(
        operation="search_gene",
        gene=gene_symbol,
        limit=10
    )

    # Gene Ontology
    # Note: Need gene ID first
    ncbi = tu.tools.NCBI_gene_search(
        term=gene_symbol,
        organism="human"
    )

    return {
        'symbol': gene_symbol,
        'omim': omim,
        'disgenet': disgenet,
        'ncbi': ncbi
    }

Report Section:

### 2.1 Fully Contained Genes (Complete Dosage Effect)

| Gene | Function | Disease Association | Inheritance | Evidence |
|------|----------|---------------------|-------------|----------|
| **MAPT** | Microtubule-associated protein tau | Frontotemporal dementia (AD) | Autosomal Dominant | ★★★ |
| **KANSL1** | Histone acetyltransferase complex | Koolen-De Vries syndrome (AD) | Autosomal Dominant | ★★★ |

**Interpretation**: Deletion results in haploinsufficiency of two dosage-sensitive genes. KANSL1 haploinsufficiency is the primary cause of pathogenicity.

*Sources: OMIM, DisGeNET, Ensembl*

### 2.2 Partially Disrupted Genes (Breakpoint Within Gene)

| Gene | Breakpoint Location | Effect | Critical Domains Lost |
|------|-------------------|--------|----------------------|
| **NF1** | Intron 28 of 58 | 5' portion deleted | Yes - GTPase-activating domain |

**Interpretation**: Breakpoint disrupts NF1 coding sequence, likely resulting in loss-of-function. NF1 is haploinsufficient (causes neurofibromatosis type 1).

### 2.3 Flanking Genes (Potential Position Effects)

| Gene | Distance from SV | Regulatory Risk | Evidence |
|------|------------------|-----------------|----------|
| **KCNJ2** | 450 kb upstream | Low | ★☆☆ |

**Note**: Position effects are possible but less common. Consider if phenotype unexplained by contained genes.

Phase 3: Dosage Sensitivity Assessment

Goal: Determine if affected genes are dosage-sensitive

Tools: | Tool | Purpose | Key Data | |------|---------|----------| | ClinGen_search_dosage_sensitivity | Gold standard curation | HI/TS scores (0-3) | | ClinGen_search_gene_validity | Gene-disease validity | Definitive/Strong/Moderate | | gnomad_search (pLI) | Loss-of-function intolerance | pLI score (0-1) | | DECIPHER_search | Developmental disorders | Patient phenotypes with similar SVs | | OMIM_get_entry | Inheritance pattern | AD/AR indicates dosage sensitivity |

ClinGen Dosage Sensitivity Scores:

| Score | Haploinsufficiency (HI) | Triplosensitivity (TS) | Interpretation | |-------|------------------------|------------------------|----------------| | 3 | Sufficient evidence | Sufficient evidence | Gene IS dosage-sensitive | | 2 | Emerging evidence | Emerging evidence | Likely dosage-sensitive | | 1 | Little evidence | Little evidence | Insufficient evidence | | 0 | No evidence | No evidence | No established dosage sensitivity |

pLI Score Interpretation (gnomAD): | pLI Range | Interpretation | LoF Intolerance | |-----------|----------------|-----------------| | ≥0.9 | Extremely intolerant | High - likely haploinsufficient | | 0.5-0.9 | Moderately intolerant | Moderate | | <0.5 | Tolerant | Low - likely NOT haploinsufficient |

Implementation:

def assess_dosage_sensitivity(tu, gene_list):
    """
    Assess dosage sensitivity for all genes in SV.
    Returns dosage scores and interpretation.
    """
    dosage_data = []

    for gene_symbol in gene_list:
        # 1. ClinGen dosage sensitivity (gold standard)
        clingen = tu.tools.ClinGen_search_dosage_sensitivity(
            gene=gene_symbol
        )

        hi_score = None
        ts_score = None
        if clingen.get('data'):
            for entry in clingen['data']:
                hi_score = entry.get('Haploinsufficiency Score')
                ts_score = entry.get('Triplosensitivity Score')
                break

        # 2. ClinGen gene validity (supports dosage sensitivity)
        validity = tu.tools.ClinGen_search_gene_validity(
            gene=gene_symbol
        )

        validity_level = None
        if validity.get('data'):
            for entry in validity['data']:
                validity_level = entry.get('Classification')
                break

        # 3. pLI score from gnomAD (if available via gene search)
        # Note: May need to use myvariant or other tools
        # pli_score = get_pli_score(tu, gene_symbol)

        # 4. OMIM inheritance pattern
        omim = tu.tools.OMIM_search(
            operation="search",
            query=gene_symbol,
            limit=3
        )

        inheritance_pattern = None
        if omim.get('data', {}).get('entries'):
            for entry in omim['data']['entries']:
                mim = entry.get('mimNumber')
                details = tu.tools.OMIM_get_entry(
                    operation="get_entry",
                    mim_number=str(mim)
                )
                # Extract inheritance from details
                # inheritance_pattern = parse_inheritance(details)

        # Integrate evidence
        dosage_assessment = {
            'gene': gene_symbol,
            'hi_score': hi_score,
            'ts_score': ts_score,
            'validity_level': validity_level,
            'inheritance': inheritance_pattern,
            'is_dosage_sensitive': (hi_score == '3' or ts_score == '3'),
            'evidence_grade': calculate_evidence_grade(hi_score, ts_score, validity_level)
        }

        dosage_data.append(dosage_assessment)

    return dosage_data

def calculate_evidence_grade(hi_score, ts_score, validity):
    """
    Calculate evidence grade for dosage sensitivity.
    """
    if (hi_score == '3' or ts_score == '3') and validity == 'Definitive':
        return '★★★'  # High confidence
    elif (hi_score in ['2', '3'] or ts_score in ['2', '3']):
        return '★★☆'  # Moderate confidence
    else:
        return '★☆☆'  # Low confidence

Report Section:

### 3. Dosage Sensitivity Assessment

#### Haploinsufficient Genes (Deletions/Disruptions)

| Gene | ClinGen HI Score | pLI | Validity | Disease | Evidence |
|------|-----------------|-----|----------|---------|----------|
| **KANSL1** | 3 (Sufficient) | 0.99 | Definitive | Koolen-De Vries syndrome | ★★★ |
| **MAPT** | 2 (Emerging) | 0.85 | Strong | FTD (rare) | ★★☆ |

**Interpretation**: KANSL1 has definitive evidence for haploinsufficiency. Deletion of one copy is expected to cause Koolen-De Vries syndrome (intellectual disability, hypotonia, distinctive facial features).

*Sources: ClinGen Dosage Sensitivity Map, gnomAD pLI*

#### Triplosensitive Genes (Duplications)

| Gene | ClinGen TS Score | Disease Mechanism | Evidence |
|------|-----------------|-------------------|----------|
| **MECP2** | 3 (Sufficient) | MECP2 duplication syndrome | ★★★ |
| **PMP22** | 3 (Sufficient) | Charcot-Marie-Tooth 1A | ★★★ |

**Note**: For this deletion, triplosensitivity is not applicable. Listed for reference.

#### Non-Dosage-Sensitive Genes

| Gene | HI Score | TS Score | Interpretation |
|------|----------|----------|----------------|
| **GENE_X** | 0 | 0 | No established dosage sensitivity |
| **GENE_Y** | 1 | 1 | Insufficient evidence |

**Interpretation**: These genes lack evidence for dosage sensitivity. Deletion/duplication less likely to be pathogenic solely due to these genes.

Phase 4: Population Frequency Context

Goal: Determine if SV is common in general population (likely benign) or rare (supports pathogenicity)

Tools: | Tool | Purpose | Key Data | |------|---------|----------| | gnomad_search | Population SV frequencies | Overlapping SVs, frequencies | | ClinVar_search_variants | Known pathogenic/benign SVs | Classification, review status | | DECIPHER_search | Patient SVs with phenotypes | Case reports, phenotype similarity |

Frequency Interpretation (adapted from ACMG):

| SV Frequency | ACMG Code | Interpretation | |--------------|-----------|----------------| | ≥1% in gnomAD SVs | BA1 (Stand-alone Benign) | Too common for rare disease | | 0.1-1% | BS1 (Strong Benign) | Likely benign common variant | | <0.01% | PM2 (Supporting Pathogenic) | Rare, supports pathogenicity | | Absent | PM2 (Supporting) | Very rare, supports pathogenicity |

Reciprocal Overlap Calculation:

For proper comparison, calculate reciprocal overlap between query SV and population SV:

Reciprocal Overlap = min(overlap_with_A, overlap_with_B)
where:
  overlap_with_A = (overlap length) / (SV_A length)
  overlap_with_B = (overlap length) / (SV_B length)

Threshold: ≥70% reciprocal overlap = "same" SV

Implementation:

def assess_population_frequency(tu, chrom, sv_start, sv_end, sv_type):
    """
    Check population databases for overlapping SVs.
    """
    # 1. Check ClinVar for known pathogenic/benign SVs
    clinvar = tu.tools.ClinVar_search_variants(
        chromosome=str(chrom),
        start=sv_start,
        stop=sv_end,
        variant_type=sv_type.upper()
    )

    known_svs = []
    if clinvar.get('data'):
        for variant in clinvar['data']:
            classification = variant.get('clinical_significance')
            known_svs.append({
                'database': 'ClinVar',
                'classification': classification,
                'review_status': variant.get('review_status'),
                'coordinates': f"{variant.get('chromosome')}:{variant.get('start')}-{variant.get('stop')}"
            })

    # 2. gnomAD SVs (if available)
    # Note: gnomAD SV database may not have direct API access via ToolUniverse
    # May need to use genomic coordinate search

    # 3. DECIPHER for similar patient cases
    decipher_search = tu.tools.DECIPHER_search(
        query=f"chr{chrom}:{sv_start}-{sv_end}",
        search_type="region"
    )

    patient_cases = []
    if decipher_search.get('data'):
        patient_cases = decipher_search['data']

    return {
        'clinvar_matches': known_svs,
        'decipher_cases': patient_cases,
        'frequency_interpretation': interpret_frequency(known_svs)
    }

def interpret_frequency(known_svs):
    """
    Interpret frequency based on ClinVar matches.
    """
    if any(sv['classification'] == 'Benign' for sv in known_svs):
        return {
            'acmg_code': 'BA1 or BS1',
            'interpretation': 'Likely benign based on ClinVar benign classification',
            'evidence_grade': '★★★'
        }
    elif any(sv['classification'] == 'Pathogenic' for sv in known_svs):
        return {
            'acmg_code': 'PS1',
            'interpretation': 'Pathogenic based on ClinVar pathogenic classification',
            'evidence_grade': '★★★'
        }
    else:
        return {
            'acmg_code': 'PM2',
            'interpretation': 'Rare variant, not found in ClinVar or population databases',
            'evidence_grade': '★★☆'
        }

Report Section:

### 4. Population Frequency Context

#### ClinVar Matches (Overlapping SVs)

| VCV ID | Classification | Size | Overlap | Review Status | Genes |
|--------|----------------|------|---------|---------------|-------|
| VCV000012345 | Pathogenic | 320 kb | 95% reciprocal | ★★★ Reviewed by expert panel | KANSL1, MAPT |

**Match Found**: Query deletion has 95% reciprocal overlap with known pathogenic deletion in ClinVar (VCV000012345). This is the Koolen-De Vries syndrome deletion.

**ACMG Code**: **PS1** (Strong) - Same genomic region as established pathogenic SV

*Source: ClinVar via `ClinVar_search_variants`*

#### gnomAD SV Database

**Search Result**: No overlapping deletions found in gnomAD SV v4.0 (>10,000 genomes)

**Interpretation**: Absence from gnomAD supports rarity and pathogenic potential.

**ACMG Code**: **PM2** (Moderate) - Absent from population databases

*Note: gnomAD SVs queried via browser (no direct API access)*

#### DECIPHER Patient Cases

| Case ID | Phenotype | SV Type | Size | Overlap | Similarity |
|---------|-----------|---------|------|---------|------------|
| 12345 | Intellectual disability, hypotonia | DEL | 315 kb | 98% | High |
| 67890 | Developmental delay, facial dysmorphism | DEL | 305 kb | 92% | High |

**Phenotype Match**: 8/10 DECIPHER patients have intellectual disability and hypotonia, consistent with Koolen-De Vries syndrome.

**ACMG Support**: **PP4** (Supporting) - Patient phenotype consistent with gene's disease association

*Source: DECIPHER via `DECIPHER_search`*

Phase 5: Pathogenicity Scoring

Goal: Quantitative pathogenicity assessment (0-10 scale)

Scoring Components:

Gene Content (40 points max):
- 10 points per dosage-sensitive gene (HI/TS score 3)
- 5 points per likely dosage-sensitive gene (score 2)
- 2 points per gene with disease association
- Cap at 40 points
Dosage Sensitivity Evidence (30 points max):
- 30 points: Multiple genes with definitive HI/TS (score 3)
- 20 points: One gene with definitive HI/TS
- 10 points: Genes with emerging evidence (score 2)
- 5 points: Predicted haploinsufficiency (pLI >0.9)
Population Frequency (20 points max):
- 20 points: Absent from gnomAD, DGV
- 10 points: Rare (<0.01%)
- 0 points: Common (>0.1%)
- -20 points: Very common (>1%) - likely benign
Clinical Evidence (10 points max):
- 10 points: Matching ClinVar pathogenic SV
- 8 points: DECIPHER cases with matching phenotype
- 5 points: Literature support for gene dosage effects
- 3 points: Phenotype consistent with genes

Pathogenicity Score Interpretation:

| Score | Classification | Confidence | Interpretation | |-------|----------------|------------|----------------| | 9-10 | Pathogenic | ★★★ | High confidence pathogenic | | 7-8 | Likely Pathogenic | ★★☆ | Strong evidence for pathogenicity | | 4-6 | VUS | ★☆☆ | Uncertain significance | | 2-3 | Likely Benign | ★★☆ | Strong evidence for benign | | 0-1 | Benign | ★★★ | High confidence benign |

Implementation:

def calculate_pathogenicity_score(gene_content, dosage_data, frequency_data, clinical_data):
    """
    Calculate comprehensive pathogenicity score (0-10 scale).
    """
    score = 0
    breakdown = {}

    # 1. Gene content scoring (40 points max)
    gene_score = 0
    for gene in gene_content['fully_contained'] + gene_content['partially_disrupted']:
        dosage_info = next((d for d in dosage_data if d['gene'] == gene['symbol']), None)
        if dosage_info:
            if dosage_info['hi_score'] == '3':
                gene_score += 10
            elif dosage_info['hi_score'] == '2':
                gene_score += 5
            elif gene.get('omim_disease'):
                gene_score += 2

    gene_score = min(gene_score, 40)  # Cap at 40
    breakdown['gene_content'] = gene_score / 40 * 4  # Scale to 0-4

    # 2. Dosage sensitivity scoring (30 points max)
    dosage_score = 0
    definitive_genes = sum(1 for d in dosage_data if d['hi_score'] == '3')

    if definitive_genes >= 2:
        dosage_score = 30
    elif definitive_genes == 1:
        dosage_score = 20
    else:
        emerging_genes = sum(1 for d in dosage_data if d['hi_score'] == '2')
        dosage_score = emerging_genes * 5

    dosage_score = min(dosage_score, 30)
    breakdown['dosage_sensitivity'] = dosage_score / 30 * 3  # Scale to 0-3

    # 3. Population frequency scoring (20 points max)
    freq_score = 0
    if frequency_data.get('frequency') is None:
        freq_score = 20  # Absent
    elif frequency_data['frequency'] < 0.0001:
        freq_score = 10  # Rare
    elif frequency_data['frequency'] < 0.001:
        freq_score = 5  # Uncommon
    elif frequency_data['frequency'] > 0.01:
        freq_score = -20  # Common - likely benign

    breakdown['population_frequency'] = freq_score / 20 * 2  # Scale to -2 to 2

    # 4. Clinical evidence scoring (10 points max)
    clinical_score = 0
    if clinical_data.get('clinvar_pathogenic'):
        clinical_score = 10
    elif clinical_data.get('decipher_matching_phenotype'):
        clinical_score = 8
    elif clinical_data.get('literature_support'):
        clinical_score = 5

    clinical_score = min(clinical_score, 10)
    breakdown['clinical_evidence'] = clinical_score / 10 * 1  # Scale to 0-1

    # Total score (0-10 scale)
    total_score = breakdown['gene_content'] + breakdown['dosage_sensitivity'] + \
                  breakdown['population_frequency'] + breakdown['clinical_evidence']

    total_score = max(0, min(10, total_score))  # Ensure 0-10 range

    return {
        'total_score': round(total_score, 1),
        'breakdown': breakdown,
        'classification': classify_score(total_score)
    }

def classify_score(score):
    """Map score to ACMG-style classification."""
    if score >= 9:
        return 'Pathogenic'
    elif score >= 7:
        return 'Likely Pathogenic'
    elif score >= 4:
        return 'VUS'
    elif score >= 2:
        return 'Likely Benign'
    else:
        return 'Benign'

Report Section:

### 5. Pathogenicity Scoring

#### Quantitative Assessment (0-10 Scale)

| Component | Points | Max | Contribution | Rationale |
|-----------|--------|-----|-------------|-----------|
| **Gene Content** | 4.0 | 4 | 40% | KANSL1 (HI score 3), MAPT (HI score 2) |
| **Dosage Sensitivity** | 2.5 | 3 | 25% | One definitive HI gene (KANSL1) |
| **Population Frequency** | 2.0 | 2 | 20% | Absent from gnomAD SVs |
| **Clinical Evidence** | 1.0 | 1 | 10% | ClinVar pathogenic match |
| **Total Score** | **9.5** | 10 | 100% | |

**Classification**: **Pathogenic** (★★★ High Confidence)

**Interpretation**: Score of 9.5/10 indicates high confidence pathogenic SV. Deletion encompasses established haploinsufficient gene (KANSL1), absent from population databases, and matches known pathogenic ClinVar variant.

#### Score Breakdown Visualization

Gene Content: ████████████████████████████████████████ 4.0/4 Dosage Sensitivity: ██████████████████████████░░░░░░░░░░░░░ 2.5/3 Population Freq: ████████████████████████████████████████ 2.0/2 Clinical Evidence: ██████████████████████████████████████░░ 1.0/1 ───────────────────────────────────────── Total: ██████████████████████████████████████░░ 9.5/10


**Key Drivers of Pathogenicity**:
1. KANSL1 haploinsufficiency (definitive evidence)
2. Exact match to known pathogenic deletion
3. Absence from population databases
4. Phenotype consistency with Koolen-De Vries syndrome

Phase 6: Literature & Clinical Evidence

Goal: Find case reports, functional studies, and clinical validation

Tools: | Tool | Purpose | Coverage | |------|---------|----------| | PubMed_search | Peer-reviewed literature | Comprehensive | | DECIPHER_search | Patient case database | Developmental disorders | | EuropePMC_search | European literature | Additional coverage |

Search Strategies:

def comprehensive_literature_search(tu, genes, sv_type, phenotype):
    """
    Search literature for SV evidence.
    """
    # 1. Gene-specific searches
    literature = []
    for gene in genes:
        # Dosage sensitivity literature
        dosage_papers = tu.tools.PubMed_search(
            query=f'"{gene}" AND (haploinsufficiency OR dosage sensitivity OR deletion syndrome)',
            max_results=20
        )

        # Case reports
        case_papers = tu.tools.PubMed_search(
            query=f'"{gene}" AND deletion AND {phenotype}',
            max_results=15
        )

        literature.append({
            'gene': gene,
            'dosage_papers': dosage_papers,
            'case_reports': case_papers
        })

    # 2. SV-specific searches
    if sv_type == 'DEL':
        sv_papers = tu.tools.PubMed_search(
            query=f'deletion AND {" AND ".join(genes[:3])} AND syndrome',
            max_results=25
        )

    # 3. DECIPHER cases
    decipher_cases = []
    for gene in genes:
        cases = tu.tools.DECIPHER_search(
            query=gene,
            search_type="gene"
        )
        decipher_cases.append(cases)

    return {
        'gene_literature': literature,
        'sv_literature': sv_papers,
        'decipher_cases': decipher_cases
    }

Report Section:

### 6. Literature & Clinical Evidence

#### Key Publications

| Study | Finding | Evidence Type | PMID |
|-------|---------|---------------|------|
| Koolen et al., 2006 | Described 17q21.31 microdeletion syndrome | Original description | 16222315 |
| Koolen et al., 2008 | KANSL1 haploinsufficiency confirmed | Functional validation | 18394581 |
| Zollino et al., 2012 | Phenotype characterization (n=52) | Clinical series | 22736773 |

**Key Findings**:
- 17q21.31 deletion is recurrent (mediated by LCRs)
- KANSL1 haploinsufficiency is primary mechanism
- Phenotype: ID (100%), hypotonia (95%), friendly demeanor (85%)
- Penetrance: >95% for developmental features

*Source: PubMed via `PubMed_search`*

#### DECIPHER Patient Cases (n=45)

**Phenotype Frequency in DECIPHER Cohort**:
| Feature | Frequency | Match to Patient |
|---------|-----------|------------------|
| Intellectual disability | 45/45 (100%) | ✓ Yes |
| Hypotonia | 42/45 (93%) | ✓ Yes |
| Feeding difficulties | 38/45 (84%) | ✓ Yes |
| Distinctive facies | 40/45 (89%) | ✓ Yes |
| Friendly personality | 35/45 (78%) | Unknown |

**Phenotype Match**: Patient phenotype highly consistent with DECIPHER cohort (4/4 assessable features present).

**ACMG Code**: **PP4** (Supporting) - Patient's clinical features consistent with gene's known phenotype

*Source: DECIPHER via `DECIPHER_search`*

#### Functional Evidence for KANSL1 Dosage Sensitivity

| Study | Model | Finding | PMID |
|-------|-------|---------|------|
| Koolen et al., 2012 | Patient cells | Reduced KANSL1 protein | 22736773 |
| Zollino et al., 2015 | Mouse model | Kansl1+/- recapitulates phenotype | 25607366 |
| Arbogast et al., 2017 | Zebrafish | kansl1 knockdown → developmental defects | 28666126 |

**Strength of Evidence**: ★★★ (High) - Multiple independent studies confirm haploinsufficiency mechanism

**ACMG Code**: **PS3_Moderate** - Well-established functional studies showing dosage sensitivity

Phase 7: ACMG-Adapted Classification

Goal: Apply ACMG/ClinGen criteria adapted for SVs

SV-Specific ACMG Criteria:

Pathogenic Evidence Codes

| Code | Strength | Criteria | SV Application | |------|----------|----------|----------------| | PVS1 | Very Strong | Null variant in HI gene | Complete deletion of HI gene | | PS1 | Strong | Same SV as known pathogenic | ≥70% reciprocal overlap with ClinVar pathogenic | | PS2 | Strong | De novo (maternity/paternity confirmed) | De novo SV in patient with matching phenotype | | PS3 | Strong | Functional studies | Gene dosage effects demonstrated | | PS4 | Strong | Case-control enrichment | SV enriched in cases vs controls | | PM1 | Moderate | Critical region | Deletion of exons in HI gene | | PM2 | Moderate | Absent from controls | Not in gnomAD SVs, DGV | | PM3 | Moderate | Recessive: homozygous or compound het | Both alleles affected (rare for SVs) | | PM4 | Moderate | Protein length change | In-frame deletion/duplication | | PM5 | Moderate | Similar SVs pathogenic | Nearby SVs in ClinVar pathogenic | | PM6 | Moderate | De novo (no confirmation) | De novo SV, phenotype consistent | | PP1 | Supporting | Segregation in family | SV segregates with phenotype | | PP2 | Supporting | Gene/pathway relevant | Genes in SV match phenotype | | PP3 | Supporting | Computational evidence | Multiple predictors support haploinsufficiency | | PP4 | Supporting | Phenotype consistent | Patient phenotype matches gene-disease |

Benign Evidence Codes

| Code | Strength | Criteria | SV Application | |------|----------|----------|----------------| | BA1 | Stand-Alone | MAF >5% | SV frequency >5% in gnomAD | | BS1 | Strong | MAF too high for disease | SV frequency >1% | | BS2 | Strong | Healthy adult with phenotype-associated genotype | SV in healthy individual (careful - reduced penetrance) | | BS3 | Strong | Functional studies show no effect | No dosage sensitivity demonstrated | | BS4 | Strong | Non-segregation | SV doesn't segregate with phenotype | | BP1 | Supporting | Missense in gene without known LOF | N/A for SVs | | BP2 | Supporting | Observed in trans with pathogenic | SV + pathogenic SNV = compound het (patient unaffected) | | BP4 | Supporting | Computational evidence benign | Predictors suggest no haploinsufficiency | | BP5 | Supporting | Found in case with alt cause | Phenotype explained by different variant | | BP7 | Supporting | Synonymous with no splice effect | N/A for SVs |

Classification Algorithm (ACMG SV Criteria):

| Classification | Evidence Required | |----------------|-------------------| | Pathogenic | PVS1 + PS1; OR 2 Strong; OR 1 Strong + 3 Moderate | | Likely Pathogenic | 1 Very Strong + 1 Moderate; OR 1 Strong + 2 Moderate; OR 3 Moderate | | VUS | Criteria not met; OR conflicting evidence | | Likely Benign | 1 Strong + 1 Supporting; OR 2 Supporting | | Benign | BA1; OR BS1 + BS2; OR 2 Strong |

Implementation:

def apply_acmg_criteria(gene_content, dosage_data, frequency_data, clinical_data, inheritance):
    """
    Apply ACMG SV criteria and calculate classification.
    """
    evidence = {
        'pathogenic': [],
        'benign': []
    }

    # PVS1: Complete deletion of HI gene
    hi_genes = [d for d in dosage_data if d['hi_score'] == '3']
    if len(hi_genes) > 0 and len(gene_content['fully_contained']) > 0:
        evidence['pathogenic'].append({
            'code': 'PVS1',
            'strength': 'Very Strong',
            'rationale': f"Complete deletion of haploinsufficient gene(s): {', '.join(g['gene'] for g in hi_genes)}"
        })

    # PS1: Same as known pathogenic SV
    if clinical_data.get('clinvar_pathogenic_match'):
        evidence['pathogenic'].append({
            'code': 'PS1',
            'strength': 'Strong',
            'rationale': f"≥70% overlap with ClinVar pathogenic SV: {clinical_data['clinvar_id']}"
        })

    # PS2: De novo with phenotype match
    if inheritance == 'de_novo' and clinical_data.get('phenotype_match'):
        evidence['pathogenic'].append({
            'code': 'PS2',
            'strength': 'Strong',
            'rationale': "De novo occurrence in patient with consistent phenotype"
        })

    # PS3: Functional studies
    if clinical_data.get('functional_evidence'):
        evidence['pathogenic'].append({
            'code': 'PS3',
            'strength': 'Strong',
            'rationale': "Well-established functional studies demonstrate dosage sensitivity"
        })

    # PM2: Absent from controls
    if frequency_data.get('frequency') == 0 or frequency_data.get('frequency') is None:
        evidence['pathogenic'].append({
            'code': 'PM2',
            'strength': 'Moderate',
            'rationale': "Absent from gnomAD SV database and DGV"
        })

    # PP4: Phenotype consistent
    if clinical_data.get('phenotype_consistent'):
        evidence['pathogenic'].append({
            'code': 'PP4',
            'strength': 'Supporting',
            'rationale': "Patient phenotype highly consistent with gene-disease association"
        })

    # BA1: Common variant
    if frequency_data.get('frequency', 0) > 0.05:
        evidence['benign'].append({
            'code': 'BA1',
            'strength': 'Stand-Alone',
            'rationale': f"Frequency {frequency_data['frequency']:.3f} too high for rare disease"
        })

    # BS1: High frequency
    if 0.01 < frequency_data.get('frequency', 0) <= 0.05:
        evidence['benign'].append({
            'code': 'BS1',
            'strength': 'Strong',
            'rationale': f"Frequency {frequency_data['frequency']:.3f} exceeds expected for disease"
        })

    # Calculate classification
    classification = determine_classification(evidence)

    return {
        'evidence': evidence,
        'classification': classification['class'],
        'confidence': classification['confidence']
    }

def determine_classification(evidence):
    """
    Apply ACMG classification rules.
    """
    path = evidence['pathogenic']
    ben = evidence['benign']

    # Count evidence by strength
    very_strong = len([e for e in path if e['strength'] == 'Very Strong'])
    strong_path = len([e for e in path if e['strength'] == 'Strong'])
    moderate_path = len([e for e in path if e['strength'] == 'Moderate'])
    supporting_path = len([e for e in path if e['strength'] == 'Supporting'])

    standalone_ben = len([e for e in ben if e['strength'] == 'Stand-Alone'])
    strong_ben = len([e for e in ben if e['strength'] == 'Strong'])
    supporting_ben = len([e for e in ben if e['strength'] == 'Supporting'])

    # Benign criteria (takes precedence if strong)
    if standalone_ben >= 1:
        return {'class': 'Benign', 'confidence': '★★★'}
    if strong_ben >= 2:
        return {'class': 'Benign', 'confidence': '★★★'}
    if strong_ben >= 1 and supporting_ben >= 1:
        return {'class': 'Likely Benign', 'confidence': '★★☆'}
    if supporting_ben >= 2:
        return {'class': 'Likely Benign', 'confidence': '★★☆'}

    # Pathogenic criteria
    if very_strong >= 1 and strong_path >= 1:
        return {'class': 'Pathogenic', 'confidence': '★★★'}
    if strong_path >= 2:
        return {'class': 'Pathogenic', 'confidence': '★★★'}
    if very_strong >= 1 and moderate_path >= 1:
        return {'class': 'Likely Pathogenic', 'confidence': '★★☆'}
    if strong_path >= 1 and moderate_path >= 2:
        return {'class': 'Likely Pathogenic', 'confidence': '★★☆'}
    if strong_path >= 1 and moderate_path >= 1 and supporting_path >= 1:
        return {'class': 'Likely Pathogenic', 'confidence': '★★☆'}
    if moderate_path >= 3:
        return {'class': 'Likely Pathogenic', 'confidence': '★☆☆'}

    # Default to VUS
    return {'class': 'VUS', 'confidence': '★☆☆'}

Report Section:

### 7. ACMG-Adapted Classification

#### Evidence Codes Applied

**Pathogenic Evidence**:

| Code | Strength | Rationale |
|------|----------|-----------|
| **PVS1** | Very Strong | Complete deletion of haploinsufficient gene (KANSL1, HI score 3) |
| **PS1** | Strong | ≥95% overlap with ClinVar pathogenic deletion (VCV000012345) |
| **PM2** | Moderate | Absent from gnomAD SV database (>10,000 genomes) |
| **PP4** | Supporting | Patient phenotype consistent with Koolen-De Vries syndrome |

**Benign Evidence**: None

#### Evidence Summary

| Pathogenic | Benign |
|------------|--------|
| 1 Very Strong (PVS1) | None |
| 1 Strong (PS1) | |
| 1 Moderate (PM2) | |
| 1 Supporting (PP4) | |

#### Classification: **PATHOGENIC** ★★★

**Rationale**: Meets ACMG criteria for Pathogenic (1 Very Strong + 1 Strong). Complete deletion of established haploinsufficient gene (KANSL1) with exact match to known pathogenic deletion.

**Confidence**: ★★★ (High) - Multiple independent lines of strong evidence

#### Classification Certainty Factors

✅ **Strengths**:
- Exact match to well-characterized pathogenic deletion
- Complete deletion of definitive HI gene (KANSL1)
- Absent from population databases
- Phenotype highly consistent with gene-disease

⚠ **Limitations**:
- None significant - this is a well-established pathogenic SV

Output Structure

Report File: `SV_analysis_report.md`

# Structural Variant Analysis Report: [SV_IDENTIFIER]

**Generated**: [Date] | **Analyst**: ToolUniverse SV Interpreter

---

## Executive Summary

| Field | Value |
|-------|-------|
| **SV Type** | Deletion / Duplication / Inversion / Translocation |
| **Coordinates** | chr17:44039927-44352659 (GRCh38) |
| **Size** | 313 kb |
| **Gene Content** | 2 genes fully contained, 0 partially disrupted |
| **Classification** | Pathogenic / Likely Pathogenic / VUS / Likely Benign / Benign |
| **Pathogenicity Score** | X.X / 10 |
| **Confidence** | ★★★ / ★★☆ / ★☆☆ |
| **Key Finding** | [One-sentence summary] |

**Clinical Action**: [Required / Recommended / None]

---

## 1. SV Identity & Classification

{SV type, coordinates, size, breakpoint precision, inheritance}

---

## 2. Gene Content Analysis

### 2.1 Fully Contained Genes
{Table of genes with functions, disease associations}

### 2.2 Partially Disrupted Genes
{Genes with breakpoints, domains affected}

### 2.3 Flanking Genes
{Genes near breakpoints, position effect risk}

---

## 3. Dosage Sensitivity Assessment

### 3.1 Haploinsufficient Genes
{ClinGen HI scores, pLI, evidence}

### 3.2 Triplosensitive Genes
{ClinGen TS scores, duplication syndromes}

### 3.3 Non-Dosage-Sensitive Genes
{Genes without established dosage effects}

---

## 4. Population Frequency Context

### 4.1 ClinVar Matches
{Known pathogenic/benign SVs}

### 4.2 gnomAD SV Database
{Population frequencies}

### 4.3 DECIPHER Patient Cases
{Similar SVs, phenotype matching}

---

## 5. Pathogenicity Scoring

### 5.1 Quantitative Assessment
{0-10 score with breakdown}

### 5.2 Score Components
{Gene content, dosage, frequency, clinical}

---

## 6. Literature & Clinical Evidence

### 6.1 Key Publications
{Functional studies, case series}

### 6.2 DECIPHER Cohort Analysis
{Phenotype frequencies, matching}

### 6.3 Functional Evidence
{Gene dosage studies}

---

## 7. ACMG-Adapted Classification

### 7.1 Evidence Codes Applied
{Pathogenic and benign codes with rationale}

### 7.2 Classification
{Final classification with confidence}

### 7.3 Certainty Factors
{Strengths and limitations}

---

## 8. Clinical Recommendations

### 8.1 For Affected Individual
{Testing, management, surveillance}

### 8.2 For Family Members
{Cascade testing, genetic counseling}

### 8.3 Reproductive Considerations
{Recurrence risk, prenatal testing}

---

## 9. Limitations & Uncertainties

{Missing data, conflicting evidence, knowledge gaps}

---

## Data Sources

{All tools and databases queried with results}

Evidence Grading System

| Symbol | Confidence | Criteria | |--------|------------|----------| | ★★★ | High | ClinGen definitive, ClinVar expert reviewed, multiple independent studies | | ★★☆ | Moderate | ClinGen strong/moderate, single good study, DECIPHER cohort support | | ★☆☆ | Limited | Computational predictions only, case reports, emerging evidence |

Special Scenarios

Scenario 1: Recurrent Microdeletion Syndrome

Additional considerations:

Check for recurrence mechanism (LCRs, NAHR)
Look for founder effects
Population-specific frequencies
Incomplete penetrance
Variable expressivity

Example: 22q11.2 deletion, 17q21.31 deletion (Koolen-De Vries)

Scenario 2: Balanced Translocation (No Gene Disruption)

Assessment approach:

If no genes disrupted: Likely benign (in most cases)
Check for cryptic imbalances
Consider position effects (rare)
Reproductive risk (unbalanced offspring)

Classification: Usually VUS or Likely Benign unless offspring affected

Scenario 3: Complex Rearrangement

Analysis strategy:

Break down into component SVs
Assess each breakpoint independently
Look for chromothripsis pattern
Consider cumulative gene dosage effects
Check for DNA repair defects

Scenario 4: Small In-Frame Deletion/Duplication

Special considerations:

May not cause haploinsufficiency
Check if critical domain affected
Look for similar variants in ClinVar
Consider protein structural impact
May need functional studies

Quantified Minimums

| Section | Requirement | |---------|-------------| | Gene content | All genes in SV region annotated | | Dosage sensitivity | ClinGen scores for all genes (if available) | | Population frequency | Check gnomAD SV + ClinVar + DGV | | Literature search | ≥2 search strategies (PubMed + DECIPHER) | | ACMG codes | All applicable codes listed |

Tools Reference

Core Tools for SV Analysis

| Tool | Purpose | Required? | |------|---------|-----------| | ClinGen_search_dosage_sensitivity | HI/TS scores | Required | | ClinGen_search_gene_validity | Gene-disease validity | Required | | ClinVar_search_variants | Known pathogenic/benign SVs | Required | | DECIPHER_search | Patient cases, phenotypes | Highly recommended | | Ensembl_lookup_gene | Gene coordinates, structure | Required | | OMIM_search, OMIM_get_entry | Gene-disease associations | Required | | DisGeNET_search_gene | Additional disease associations | Recommended | | PubMed_search | Literature evidence | Recommended | | Gene_Ontology_get_term_info | Gene function | Supporting |

Report File Naming

SV_analysis_[TYPE]_chr[CHR]_[START]_[END]_[GENES].md

Examples:
SV_analysis_DEL_chr17_44039927_44352659_KANSL1_MAPT.md
SV_analysis_DUP_chr22_17400000_17800000_TBX1.md
SV_analysis_INV_chr11_2100000_2400000_complex.md

Clinical Recommendations Framework

For Pathogenic/Likely Pathogenic SVs

| SV Type | Recommendations | |---------|-----------------| | Deletion (HI gene) | Genetic counseling, cascade testing, phenotype-specific surveillance | | Duplication (TS gene) | Same as deletion; check for dosage-specific syndrome | | Translocation (disruption) | Assess both breakpoints, consider reproductive counseling | | Complex | Multidisciplinary evaluation, research enrollment |

For VUS

| Action | Details | |--------|---------| | Clinical management | Base on phenotype, not genotype | | Follow-up | Reinterpret in 1-2 years or when phenotype evolves | | Research | Functional studies if research-grade samples available | | Family studies | Segregation analysis can reclassify |

For Benign/Likely Benign

| Action | Details | |--------|---------| | Clinical | Not expected to cause rare disease | | Family | No cascade testing needed (unless recurrent/reproductive risk) | | Reproductive | Balanced translocation carriers may have offspring risk |

When NOT to Use This Skill

Single nucleotide variants (SNVs) → Use tooluniverse-variant-interpretation skill
Small indels (<50 bp) → Use variant interpretation skill
Somatic variants in cancer → Different framework needed
Mitochondrial variants → Specialized interpretation required
Repeat expansions → Different mechanism

Use this skill for structural variants ≥50 bp requiring dosage sensitivity assessment and ACMG-adapted classification.

name: tooluniverse-structural-variant-analysis description: Comprehensive structural variant (SV) analysis skill for clinical genomics. Classifies SVs (deletions, duplications, inversions, translocations), assesses pathogenicity using ACMG-adapted criteria, evaluates gene disruption and dosage sensitivity, and provides clinical interpretation with evidence grading. Use when analyzing CNVs, large deletions/duplications, chromosomal rearrangements, or any structural variants requiring clinical interpretation.

Structural Variant Analysis Workflow

Systematic analysis of structural variants (deletions, duplications, inversions, translocations, complex rearrangements) for clinical genomics interpretation using ACMG-adapted criteria.

KEY PRINCIPLES:

Report-first approach - Create SV_analysis_report.md FIRST, then populate progressively
ACMG-style classification - Pathogenic/Likely Pathogenic/VUS/Likely Benign/Benign with explicit evidence
Evidence grading - Grade all findings by confidence level (★★★/★★☆/★☆☆)
Dosage sensitivity critical - Gene dosage effects drive SV pathogenicity
Breakpoint precision matters - Exact gene disruption vs dosage-only effects
Population context essential - gnomAD SVs for frequency assessment
English-first queries - Always use English terms in tool calls (gene names, disease names), even if the user writes in another language. Only try original-language terms as a fallback. Respond in the user's language

Problem This Skill Solves

Structural variants (SVs) present unique interpretation challenges:

Complex molecular consequences - SVs can cause gene dosage changes, gene disruption, gene fusions, position effects
Size matters - Pathogenicity depends on size, gene content, and breakpoint precision
Limited databases - Fewer curated SVs in ClinVar compared to SNVs
Dosage sensitivity - Haploinsufficiency and triplosensitivity are critical but gene-specific
Population frequency - Large benign CNVs are common; distinguishing pathogenic from benign is challenging

Triggers

Use this skill when users:

Ask about structural variant interpretation
Have CNV data from array or sequencing
Ask "is this deletion/duplication pathogenic?"
Need ACMG classification for SVs
Want to assess gene dosage effects
Ask about chromosomal rearrangements
Have large-scale genomic alterations requiring interpretation

Workflow Overview

┌─────────────────────────────────────────────────────────────────┐
│              STRUCTURAL VARIANT INTERPRETATION                   │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Phase 1: SV IDENTITY & CLASSIFICATION                          │
│  ├── Normalize SV coordinates (hg19/hg38)                       │
│  ├── Determine SV type (DEL/DUP/INV/TRA/CPX)                   │
│  ├── Calculate SV size                                          │
│  └── Assess breakpoint precision                                │
│                                                                  │
│  Phase 2: GENE CONTENT ANALYSIS                                  │
│  ├── Identify genes fully contained in SV                       │
│  ├── Identify genes with breakpoints (disrupted)                │
│  ├── Annotate gene function and disease associations            │
│  ├── Identify regulatory elements affected                      │
│  └── Assess gene orientation (for inversions/translocations)    │
│                                                                  │
│  Phase 3: DOSAGE SENSITIVITY ASSESSMENT                          │
│  ├── ClinGen dosage sensitivity scores                          │
│  │   └─ Haploinsufficiency / Triplosensitivity ratings          │
│  ├── DECIPHER haploinsufficiency predictions                    │
│  ├── pLI scores (gnomAD) for loss-of-function intolerance       │
│  ├── OMIM gene-disease associations (dominant/recessive)        │
│  └── Known dosage-sensitive genes from literature               │
│                                                                  │
│  Phase 4: POPULATION FREQUENCY CONTEXT                           │
│  ├── gnomAD SV database (overlapping SVs)                       │
│  ├── DGV (Database of Genomic Variants)                         │
│  ├── ClinVar (known pathogenic/benign SVs)                      │
│  └── Calculate reciprocal overlap with population SVs           │
│                                                                  │
│  Phase 5: PATHOGENICITY SCORING                                  │
│  ├── Pathogenicity score (0-10 scale)                           │
│  │   ├─ Gene content weight (40%)                               │
│  │   ├─ Dosage sensitivity weight (30%)                         │
│  │   ├─ Population frequency weight (20%)                       │
│  │   └─ Inheritance/phenotype match weight (10%)                │
│  ├── Apply ACMG SV criteria                                     │
│  └── Generate classification recommendation                      │
│                                                                  │
│  Phase 6: LITERATURE & CLINICAL EVIDENCE                         │
│  ├── PubMed: Similar SVs, gene disruption studies               │
│  ├── DECIPHER: Developmental disorder cases                     │
│  ├── Clinical case reports                                      │
│  └── Functional evidence for gene dosage effects                │
│                                                                  │
│  Phase 7: ACMG-ADAPTED CLASSIFICATION                            │
│  ├── Apply SV-specific evidence codes                           │
│  ├── Calculate final classification                             │
│  ├── Identify limiting factors                                  │
│  └── Generate clinical recommendations                          │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Phase Details

Phase 1: SV Identity & Classification

Goal: Standardize SV notation and classify type

Key Information to Capture:

Chromosome(s) involved
Coordinates (start, end) in hg19/hg38
SV size (bp or Mb)
SV type (DEL/DUP/INV/TRA/CPX)
Breakpoint precision (±50bp, ±1kb, etc.)
Inheritance pattern (de novo, inherited, unknown)

Example:

SV: arr[GRCh38] 17q21.31(44039927-44352659)x1
- Type: Deletion (heterozygous)
- Size: 313 kb
- Genes: MAPT, KANSL1 (fully contained)
- Breakpoints: Well-defined (array resolution ±5kb)

Phase 2: Gene Content Analysis

Goal: Comprehensive annotation of genes affected by SV

Gene Categories:

Fully contained genes - Entire gene within SV boundaries
- Deletion: Complete loss of one copy (haploinsufficiency)
- Duplication: Extra copy (triplosensitivity)
Partially disrupted genes - Breakpoint within gene
- Likely loss-of-function for affected allele
- Check if critical domains disrupted
Flanking genes - Within 1 Mb of breakpoints
- May be affected by position effects
- Regulatory disruption possible

Example Gene Content Analysis:

def analyze_gene_content(tu, chrom, sv_start, sv_end, sv_type):
    """
    Identify and annotate all genes within SV region.
    """
    genes = {
        'fully_contained': [],
        'partially_disrupted': [],
        'flanking': []
    }

    # Use Ensembl to find overlapping genes
    # This is pseudocode - actual implementation depends on available tools

    for gene in genes_in_region:
        gene_start = gene['start']
        gene_end = gene['end']

        # Classify gene relationship to SV
        if gene_start >= sv_start and gene_end <= sv_end:
            # Fully contained
            gene_info = annotate_gene(tu, gene['symbol'])
            genes['fully_contained'].append(gene_info)

        elif (gene_start < sv_start < gene_end) or (gene_start < sv_end < gene_end):
            # Partially disrupted
            gene_info = annotate_gene(tu, gene['symbol'])
            genes['partially_disrupted'].append(gene_info)

        elif abs(gene_start - sv_end) < 1000000 or abs(gene_end - sv_start) < 1000000:
            # Flanking (within 1 Mb)
            gene_info = annotate_gene(tu, gene['symbol'])
            genes['flanking'].append(gene_info)

    return genes

def annotate_gene(tu, gene_symbol):
    """
    Comprehensive gene annotation.
    """
    # OMIM associations
    omim = tu.tools.OMIM_search(
        operation="search",
        query=gene_symbol,
        limit=5
    )

    # DisGeNET associations
    disgenet = tu.tools.DisGeNET_search_gene(
        operation="search_gene",
        gene=gene_symbol,
        limit=10
    )

    # Gene Ontology
    # Note: Need gene ID first
    ncbi = tu.tools.NCBI_gene_search(
        term=gene_symbol,
        organism="human"
    )

    return {
        'symbol': gene_symbol,
        'omim': omim,
        'disgenet': disgenet,
        'ncbi': ncbi
    }

Report Section:

### 2.1 Fully Contained Genes (Complete Dosage Effect)

| Gene | Function | Disease Association | Inheritance | Evidence |
|------|----------|---------------------|-------------|----------|
| **MAPT** | Microtubule-associated protein tau | Frontotemporal dementia (AD) | Autosomal Dominant | ★★★ |
| **KANSL1** | Histone acetyltransferase complex | Koolen-De Vries syndrome (AD) | Autosomal Dominant | ★★★ |

**Interpretation**: Deletion results in haploinsufficiency of two dosage-sensitive genes. KANSL1 haploinsufficiency is the primary cause of pathogenicity.

*Sources: OMIM, DisGeNET, Ensembl*

### 2.2 Partially Disrupted Genes (Breakpoint Within Gene)

| Gene | Breakpoint Location | Effect | Critical Domains Lost |
|------|-------------------|--------|----------------------|
| **NF1** | Intron 28 of 58 | 5' portion deleted | Yes - GTPase-activating domain |

**Interpretation**: Breakpoint disrupts NF1 coding sequence, likely resulting in loss-of-function. NF1 is haploinsufficient (causes neurofibromatosis type 1).

### 2.3 Flanking Genes (Potential Position Effects)

| Gene | Distance from SV | Regulatory Risk | Evidence |
|------|------------------|-----------------|----------|
| **KCNJ2** | 450 kb upstream | Low | ★☆☆ |

**Note**: Position effects are possible but less common. Consider if phenotype unexplained by contained genes.

Phase 3: Dosage Sensitivity Assessment

Goal: Determine if affected genes are dosage-sensitive

ClinGen Dosage Sensitivity Scores:

Implementation:

def assess_dosage_sensitivity(tu, gene_list):
    """
    Assess dosage sensitivity for all genes in SV.
    Returns dosage scores and interpretation.
    """
    dosage_data = []

    for gene_symbol in gene_list:
        # 1. ClinGen dosage sensitivity (gold standard)
        clingen = tu.tools.ClinGen_search_dosage_sensitivity(
            gene=gene_symbol
        )

        hi_score = None
        ts_score = None
        if clingen.get('data'):
            for entry in clingen['data']:
                hi_score = entry.get('Haploinsufficiency Score')
                ts_score = entry.get('Triplosensitivity Score')
                break

        # 2. ClinGen gene validity (supports dosage sensitivity)
        validity = tu.tools.ClinGen_search_gene_validity(
            gene=gene_symbol
        )

        validity_level = None
        if validity.get('data'):
            for entry in validity['data']:
                validity_level = entry.get('Classification')
                break

        # 3. pLI score from gnomAD (if available via gene search)
        # Note: May need to use myvariant or other tools
        # pli_score = get_pli_score(tu, gene_symbol)

        # 4. OMIM inheritance pattern
        omim = tu.tools.OMIM_search(
            operation="search",
            query=gene_symbol,
            limit=3
        )

        inheritance_pattern = None
        if omim.get('data', {}).get('entries'):
            for entry in omim['data']['entries']:
                mim = entry.get('mimNumber')
                details = tu.tools.OMIM_get_entry(
                    operation="get_entry",
                    mim_number=str(mim)
                )
                # Extract inheritance from details
                # inheritance_pattern = parse_inheritance(details)

        # Integrate evidence
        dosage_assessment = {
            'gene': gene_symbol,
            'hi_score': hi_score,
            'ts_score': ts_score,
            'validity_level': validity_level,
            'inheritance': inheritance_pattern,
            'is_dosage_sensitive': (hi_score == '3' or ts_score == '3'),
            'evidence_grade': calculate_evidence_grade(hi_score, ts_score, validity_level)
        }

        dosage_data.append(dosage_assessment)

    return dosage_data

def calculate_evidence_grade(hi_score, ts_score, validity):
    """
    Calculate evidence grade for dosage sensitivity.
    """
    if (hi_score == '3' or ts_score == '3') and validity == 'Definitive':
        return '★★★'  # High confidence
    elif (hi_score in ['2', '3'] or ts_score in ['2', '3']):
        return '★★☆'  # Moderate confidence
    else:
        return '★☆☆'  # Low confidence

Report Section:

### 3. Dosage Sensitivity Assessment

#### Haploinsufficient Genes (Deletions/Disruptions)

| Gene | ClinGen HI Score | pLI | Validity | Disease | Evidence |
|------|-----------------|-----|----------|---------|----------|
| **KANSL1** | 3 (Sufficient) | 0.99 | Definitive | Koolen-De Vries syndrome | ★★★ |
| **MAPT** | 2 (Emerging) | 0.85 | Strong | FTD (rare) | ★★☆ |

**Interpretation**: KANSL1 has definitive evidence for haploinsufficiency. Deletion of one copy is expected to cause Koolen-De Vries syndrome (intellectual disability, hypotonia, distinctive facial features).

*Sources: ClinGen Dosage Sensitivity Map, gnomAD pLI*

#### Triplosensitive Genes (Duplications)

| Gene | ClinGen TS Score | Disease Mechanism | Evidence |
|------|-----------------|-------------------|----------|
| **MECP2** | 3 (Sufficient) | MECP2 duplication syndrome | ★★★ |
| **PMP22** | 3 (Sufficient) | Charcot-Marie-Tooth 1A | ★★★ |

**Note**: For this deletion, triplosensitivity is not applicable. Listed for reference.

#### Non-Dosage-Sensitive Genes

| Gene | HI Score | TS Score | Interpretation |
|------|----------|----------|----------------|
| **GENE_X** | 0 | 0 | No established dosage sensitivity |
| **GENE_Y** | 1 | 1 | Insufficient evidence |

**Interpretation**: These genes lack evidence for dosage sensitivity. Deletion/duplication less likely to be pathogenic solely due to these genes.

Phase 4: Population Frequency Context

Goal: Determine if SV is common in general population (likely benign) or rare (supports pathogenicity)

Frequency Interpretation (adapted from ACMG):

Reciprocal Overlap Calculation:

For proper comparison, calculate reciprocal overlap between query SV and population SV:

Reciprocal Overlap = min(overlap_with_A, overlap_with_B)
where:
  overlap_with_A = (overlap length) / (SV_A length)
  overlap_with_B = (overlap length) / (SV_B length)

Threshold: ≥70% reciprocal overlap = "same" SV

Implementation:

def assess_population_frequency(tu, chrom, sv_start, sv_end, sv_type):
    """
    Check population databases for overlapping SVs.
    """
    # 1. Check ClinVar for known pathogenic/benign SVs
    clinvar = tu.tools.ClinVar_search_variants(
        chromosome=str(chrom),
        start=sv_start,
        stop=sv_end,
        variant_type=sv_type.upper()
    )

    known_svs = []
    if clinvar.get('data'):
        for variant in clinvar['data']:
            classification = variant.get('clinical_significance')
            known_svs.append({
                'database': 'ClinVar',
                'classification': classification,
                'review_status': variant.get('review_status'),
                'coordinates': f"{variant.get('chromosome')}:{variant.get('start')}-{variant.get('stop')}"
            })

    # 2. gnomAD SVs (if available)
    # Note: gnomAD SV database may not have direct API access via ToolUniverse
    # May need to use genomic coordinate search

    # 3. DECIPHER for similar patient cases
    decipher_search = tu.tools.DECIPHER_search(
        query=f"chr{chrom}:{sv_start}-{sv_end}",
        search_type="region"
    )

    patient_cases = []
    if decipher_search.get('data'):
        patient_cases = decipher_search['data']

    return {
        'clinvar_matches': known_svs,
        'decipher_cases': patient_cases,
        'frequency_interpretation': interpret_frequency(known_svs)
    }

def interpret_frequency(known_svs):
    """
    Interpret frequency based on ClinVar matches.
    """
    if any(sv['classification'] == 'Benign' for sv in known_svs):
        return {
            'acmg_code': 'BA1 or BS1',
            'interpretation': 'Likely benign based on ClinVar benign classification',
            'evidence_grade': '★★★'
        }
    elif any(sv['classification'] == 'Pathogenic' for sv in known_svs):
        return {
            'acmg_code': 'PS1',
            'interpretation': 'Pathogenic based on ClinVar pathogenic classification',
            'evidence_grade': '★★★'
        }
    else:
        return {
            'acmg_code': 'PM2',
            'interpretation': 'Rare variant, not found in ClinVar or population databases',
            'evidence_grade': '★★☆'
        }

Report Section:

### 4. Population Frequency Context

#### ClinVar Matches (Overlapping SVs)

| VCV ID | Classification | Size | Overlap | Review Status | Genes |
|--------|----------------|------|---------|---------------|-------|
| VCV000012345 | Pathogenic | 320 kb | 95% reciprocal | ★★★ Reviewed by expert panel | KANSL1, MAPT |

**Match Found**: Query deletion has 95% reciprocal overlap with known pathogenic deletion in ClinVar (VCV000012345). This is the Koolen-De Vries syndrome deletion.

**ACMG Code**: **PS1** (Strong) - Same genomic region as established pathogenic SV

*Source: ClinVar via `ClinVar_search_variants`*

#### gnomAD SV Database

**Search Result**: No overlapping deletions found in gnomAD SV v4.0 (>10,000 genomes)

**Interpretation**: Absence from gnomAD supports rarity and pathogenic potential.

**ACMG Code**: **PM2** (Moderate) - Absent from population databases

*Note: gnomAD SVs queried via browser (no direct API access)*

#### DECIPHER Patient Cases

| Case ID | Phenotype | SV Type | Size | Overlap | Similarity |
|---------|-----------|---------|------|---------|------------|
| 12345 | Intellectual disability, hypotonia | DEL | 315 kb | 98% | High |
| 67890 | Developmental delay, facial dysmorphism | DEL | 305 kb | 92% | High |

**Phenotype Match**: 8/10 DECIPHER patients have intellectual disability and hypotonia, consistent with Koolen-De Vries syndrome.

**ACMG Support**: **PP4** (Supporting) - Patient phenotype consistent with gene's disease association

*Source: DECIPHER via `DECIPHER_search`*

Phase 5: Pathogenicity Scoring

Goal: Quantitative pathogenicity assessment (0-10 scale)

Scoring Components:

Gene Content (40 points max):
- 10 points per dosage-sensitive gene (HI/TS score 3)
- 5 points per likely dosage-sensitive gene (score 2)
- 2 points per gene with disease association
- Cap at 40 points
Dosage Sensitivity Evidence (30 points max):
- 30 points: Multiple genes with definitive HI/TS (score 3)
- 20 points: One gene with definitive HI/TS
- 10 points: Genes with emerging evidence (score 2)
- 5 points: Predicted haploinsufficiency (pLI >0.9)
Population Frequency (20 points max):
- 20 points: Absent from gnomAD, DGV
- 10 points: Rare (<0.01%)
- 0 points: Common (>0.1%)
- -20 points: Very common (>1%) - likely benign
Clinical Evidence (10 points max):
- 10 points: Matching ClinVar pathogenic SV
- 8 points: DECIPHER cases with matching phenotype
- 5 points: Literature support for gene dosage effects
- 3 points: Phenotype consistent with genes

Pathogenicity Score Interpretation:

Implementation:

def calculate_pathogenicity_score(gene_content, dosage_data, frequency_data, clinical_data):
    """
    Calculate comprehensive pathogenicity score (0-10 scale).
    """
    score = 0
    breakdown = {}

    # 1. Gene content scoring (40 points max)
    gene_score = 0
    for gene in gene_content['fully_contained'] + gene_content['partially_disrupted']:
        dosage_info = next((d for d in dosage_data if d['gene'] == gene['symbol']), None)
        if dosage_info:
            if dosage_info['hi_score'] == '3':
                gene_score += 10
            elif dosage_info['hi_score'] == '2':
                gene_score += 5
            elif gene.get('omim_disease'):
                gene_score += 2

    gene_score = min(gene_score, 40)  # Cap at 40
    breakdown['gene_content'] = gene_score / 40 * 4  # Scale to 0-4

    # 2. Dosage sensitivity scoring (30 points max)
    dosage_score = 0
    definitive_genes = sum(1 for d in dosage_data if d['hi_score'] == '3')

    if definitive_genes >= 2:
        dosage_score = 30
    elif definitive_genes == 1:
        dosage_score = 20
    else:
        emerging_genes = sum(1 for d in dosage_data if d['hi_score'] == '2')
        dosage_score = emerging_genes * 5

    dosage_score = min(dosage_score, 30)
    breakdown['dosage_sensitivity'] = dosage_score / 30 * 3  # Scale to 0-3

    # 3. Population frequency scoring (20 points max)
    freq_score = 0
    if frequency_data.get('frequency') is None:
        freq_score = 20  # Absent
    elif frequency_data['frequency'] < 0.0001:
        freq_score = 10  # Rare
    elif frequency_data['frequency'] < 0.001:
        freq_score = 5  # Uncommon
    elif frequency_data['frequency'] > 0.01:
        freq_score = -20  # Common - likely benign

    breakdown['population_frequency'] = freq_score / 20 * 2  # Scale to -2 to 2

    # 4. Clinical evidence scoring (10 points max)
    clinical_score = 0
    if clinical_data.get('clinvar_pathogenic'):
        clinical_score = 10
    elif clinical_data.get('decipher_matching_phenotype'):
        clinical_score = 8
    elif clinical_data.get('literature_support'):
        clinical_score = 5

    clinical_score = min(clinical_score, 10)
    breakdown['clinical_evidence'] = clinical_score / 10 * 1  # Scale to 0-1

    # Total score (0-10 scale)
    total_score = breakdown['gene_content'] + breakdown['dosage_sensitivity'] + \
                  breakdown['population_frequency'] + breakdown['clinical_evidence']

    total_score = max(0, min(10, total_score))  # Ensure 0-10 range

    return {
        'total_score': round(total_score, 1),
        'breakdown': breakdown,
        'classification': classify_score(total_score)
    }

def classify_score(score):
    """Map score to ACMG-style classification."""
    if score >= 9:
        return 'Pathogenic'
    elif score >= 7:
        return 'Likely Pathogenic'
    elif score >= 4:
        return 'VUS'
    elif score >= 2:
        return 'Likely Benign'
    else:
        return 'Benign'

Report Section:

### 5. Pathogenicity Scoring

#### Quantitative Assessment (0-10 Scale)

| Component | Points | Max | Contribution | Rationale |
|-----------|--------|-----|-------------|-----------|
| **Gene Content** | 4.0 | 4 | 40% | KANSL1 (HI score 3), MAPT (HI score 2) |
| **Dosage Sensitivity** | 2.5 | 3 | 25% | One definitive HI gene (KANSL1) |
| **Population Frequency** | 2.0 | 2 | 20% | Absent from gnomAD SVs |
| **Clinical Evidence** | 1.0 | 1 | 10% | ClinVar pathogenic match |
| **Total Score** | **9.5** | 10 | 100% | |

**Classification**: **Pathogenic** (★★★ High Confidence)

**Interpretation**: Score of 9.5/10 indicates high confidence pathogenic SV. Deletion encompasses established haploinsufficient gene (KANSL1), absent from population databases, and matches known pathogenic ClinVar variant.

#### Score Breakdown Visualization


**Key Drivers of Pathogenicity**:
1. KANSL1 haploinsufficiency (definitive evidence)
2. Exact match to known pathogenic deletion
3. Absence from population databases
4. Phenotype consistency with Koolen-De Vries syndrome

Phase 6: Literature & Clinical Evidence

Goal: Find case reports, functional studies, and clinical validation

Search Strategies:

def comprehensive_literature_search(tu, genes, sv_type, phenotype):
    """
    Search literature for SV evidence.
    """
    # 1. Gene-specific searches
    literature = []
    for gene in genes:
        # Dosage sensitivity literature
        dosage_papers = tu.tools.PubMed_search(
            query=f'"{gene}" AND (haploinsufficiency OR dosage sensitivity OR deletion syndrome)',
            max_results=20
        )

        # Case reports
        case_papers = tu.tools.PubMed_search(
            query=f'"{gene}" AND deletion AND {phenotype}',
            max_results=15
        )

        literature.append({
            'gene': gene,
            'dosage_papers': dosage_papers,
            'case_reports': case_papers
        })

    # 2. SV-specific searches
    if sv_type == 'DEL':
        sv_papers = tu.tools.PubMed_search(
            query=f'deletion AND {" AND ".join(genes[:3])} AND syndrome',
            max_results=25
        )

    # 3. DECIPHER cases
    decipher_cases = []
    for gene in genes:
        cases = tu.tools.DECIPHER_search(
            query=gene,
            search_type="gene"
        )
        decipher_cases.append(cases)

    return {
        'gene_literature': literature,
        'sv_literature': sv_papers,
        'decipher_cases': decipher_cases
    }

Report Section:

### 6. Literature & Clinical Evidence

#### Key Publications

| Study | Finding | Evidence Type | PMID |
|-------|---------|---------------|------|
| Koolen et al., 2006 | Described 17q21.31 microdeletion syndrome | Original description | 16222315 |
| Koolen et al., 2008 | KANSL1 haploinsufficiency confirmed | Functional validation | 18394581 |
| Zollino et al., 2012 | Phenotype characterization (n=52) | Clinical series | 22736773 |

**Key Findings**:
- 17q21.31 deletion is recurrent (mediated by LCRs)
- KANSL1 haploinsufficiency is primary mechanism
- Phenotype: ID (100%), hypotonia (95%), friendly demeanor (85%)
- Penetrance: >95% for developmental features

*Source: PubMed via `PubMed_search`*

#### DECIPHER Patient Cases (n=45)

**Phenotype Frequency in DECIPHER Cohort**:
| Feature | Frequency | Match to Patient |
|---------|-----------|------------------|
| Intellectual disability | 45/45 (100%) | ✓ Yes |
| Hypotonia | 42/45 (93%) | ✓ Yes |
| Feeding difficulties | 38/45 (84%) | ✓ Yes |
| Distinctive facies | 40/45 (89%) | ✓ Yes |
| Friendly personality | 35/45 (78%) | Unknown |

**Phenotype Match**: Patient phenotype highly consistent with DECIPHER cohort (4/4 assessable features present).

**ACMG Code**: **PP4** (Supporting) - Patient's clinical features consistent with gene's known phenotype

*Source: DECIPHER via `DECIPHER_search`*

#### Functional Evidence for KANSL1 Dosage Sensitivity

| Study | Model | Finding | PMID |
|-------|-------|---------|------|
| Koolen et al., 2012 | Patient cells | Reduced KANSL1 protein | 22736773 |
| Zollino et al., 2015 | Mouse model | Kansl1+/- recapitulates phenotype | 25607366 |
| Arbogast et al., 2017 | Zebrafish | kansl1 knockdown → developmental defects | 28666126 |

**Strength of Evidence**: ★★★ (High) - Multiple independent studies confirm haploinsufficiency mechanism

**ACMG Code**: **PS3_Moderate** - Well-established functional studies showing dosage sensitivity

Phase 7: ACMG-Adapted Classification

Goal: Apply ACMG/ClinGen criteria adapted for SVs

SV-Specific ACMG Criteria:

Pathogenic Evidence Codes

Benign Evidence Codes

Classification Algorithm (ACMG SV Criteria):

Implementation:

def apply_acmg_criteria(gene_content, dosage_data, frequency_data, clinical_data, inheritance):
    """
    Apply ACMG SV criteria and calculate classification.
    """
    evidence = {
        'pathogenic': [],
        'benign': []
    }

    # PVS1: Complete deletion of HI gene
    hi_genes = [d for d in dosage_data if d['hi_score'] == '3']
    if len(hi_genes) > 0 and len(gene_content['fully_contained']) > 0:
        evidence['pathogenic'].append({
            'code': 'PVS1',
            'strength': 'Very Strong',
            'rationale': f"Complete deletion of haploinsufficient gene(s): {', '.join(g['gene'] for g in hi_genes)}"
        })

    # PS1: Same as known pathogenic SV
    if clinical_data.get('clinvar_pathogenic_match'):
        evidence['pathogenic'].append({
            'code': 'PS1',
            'strength': 'Strong',
            'rationale': f"≥70% overlap with ClinVar pathogenic SV: {clinical_data['clinvar_id']}"
        })

    # PS2: De novo with phenotype match
    if inheritance == 'de_novo' and clinical_data.get('phenotype_match'):
        evidence['pathogenic'].append({
            'code': 'PS2',
            'strength': 'Strong',
            'rationale': "De novo occurrence in patient with consistent phenotype"
        })

    # PS3: Functional studies
    if clinical_data.get('functional_evidence'):
        evidence['pathogenic'].append({
            'code': 'PS3',
            'strength': 'Strong',
            'rationale': "Well-established functional studies demonstrate dosage sensitivity"
        })

    # PM2: Absent from controls
    if frequency_data.get('frequency') == 0 or frequency_data.get('frequency') is None:
        evidence['pathogenic'].append({
            'code': 'PM2',
            'strength': 'Moderate',
            'rationale': "Absent from gnomAD SV database and DGV"
        })

    # PP4: Phenotype consistent
    if clinical_data.get('phenotype_consistent'):
        evidence['pathogenic'].append({
            'code': 'PP4',
            'strength': 'Supporting',
            'rationale': "Patient phenotype highly consistent with gene-disease association"
        })

    # BA1: Common variant
    if frequency_data.get('frequency', 0) > 0.05:
        evidence['benign'].append({
            'code': 'BA1',
            'strength': 'Stand-Alone',
            'rationale': f"Frequency {frequency_data['frequency']:.3f} too high for rare disease"
        })

    # BS1: High frequency
    if 0.01 < frequency_data.get('frequency', 0) <= 0.05:
        evidence['benign'].append({
            'code': 'BS1',
            'strength': 'Strong',
            'rationale': f"Frequency {frequency_data['frequency']:.3f} exceeds expected for disease"
        })

    # Calculate classification
    classification = determine_classification(evidence)

    return {
        'evidence': evidence,
        'classification': classification['class'],
        'confidence': classification['confidence']
    }

def determine_classification(evidence):
    """
    Apply ACMG classification rules.
    """
    path = evidence['pathogenic']
    ben = evidence['benign']

    # Count evidence by strength
    very_strong = len([e for e in path if e['strength'] == 'Very Strong'])
    strong_path = len([e for e in path if e['strength'] == 'Strong'])
    moderate_path = len([e for e in path if e['strength'] == 'Moderate'])
    supporting_path = len([e for e in path if e['strength'] == 'Supporting'])

    standalone_ben = len([e for e in ben if e['strength'] == 'Stand-Alone'])
    strong_ben = len([e for e in ben if e['strength'] == 'Strong'])
    supporting_ben = len([e for e in ben if e['strength'] == 'Supporting'])

    # Benign criteria (takes precedence if strong)
    if standalone_ben >= 1:
        return {'class': 'Benign', 'confidence': '★★★'}
    if strong_ben >= 2:
        return {'class': 'Benign', 'confidence': '★★★'}
    if strong_ben >= 1 and supporting_ben >= 1:
        return {'class': 'Likely Benign', 'confidence': '★★☆'}
    if supporting_ben >= 2:
        return {'class': 'Likely Benign', 'confidence': '★★☆'}

    # Pathogenic criteria
    if very_strong >= 1 and strong_path >= 1:
        return {'class': 'Pathogenic', 'confidence': '★★★'}
    if strong_path >= 2:
        return {'class': 'Pathogenic', 'confidence': '★★★'}
    if very_strong >= 1 and moderate_path >= 1:
        return {'class': 'Likely Pathogenic', 'confidence': '★★☆'}
    if strong_path >= 1 and moderate_path >= 2:
        return {'class': 'Likely Pathogenic', 'confidence': '★★☆'}
    if strong_path >= 1 and moderate_path >= 1 and supporting_path >= 1:
        return {'class': 'Likely Pathogenic', 'confidence': '★★☆'}
    if moderate_path >= 3:
        return {'class': 'Likely Pathogenic', 'confidence': '★☆☆'}

    # Default to VUS
    return {'class': 'VUS', 'confidence': '★☆☆'}

Report Section:

### 7. ACMG-Adapted Classification

#### Evidence Codes Applied

**Pathogenic Evidence**:

| Code | Strength | Rationale |
|------|----------|-----------|
| **PVS1** | Very Strong | Complete deletion of haploinsufficient gene (KANSL1, HI score 3) |
| **PS1** | Strong | ≥95% overlap with ClinVar pathogenic deletion (VCV000012345) |
| **PM2** | Moderate | Absent from gnomAD SV database (>10,000 genomes) |
| **PP4** | Supporting | Patient phenotype consistent with Koolen-De Vries syndrome |

**Benign Evidence**: None

#### Evidence Summary

| Pathogenic | Benign |
|------------|--------|
| 1 Very Strong (PVS1) | None |
| 1 Strong (PS1) | |
| 1 Moderate (PM2) | |
| 1 Supporting (PP4) | |

#### Classification: **PATHOGENIC** ★★★

**Rationale**: Meets ACMG criteria for Pathogenic (1 Very Strong + 1 Strong). Complete deletion of established haploinsufficient gene (KANSL1) with exact match to known pathogenic deletion.

**Confidence**: ★★★ (High) - Multiple independent lines of strong evidence

#### Classification Certainty Factors

✅ **Strengths**:
- Exact match to well-characterized pathogenic deletion
- Complete deletion of definitive HI gene (KANSL1)
- Absent from population databases
- Phenotype highly consistent with gene-disease

⚠ **Limitations**:
- None significant - this is a well-established pathogenic SV

Output Structure

Report File: `SV_analysis_report.md`

# Structural Variant Analysis Report: [SV_IDENTIFIER]

**Generated**: [Date] | **Analyst**: ToolUniverse SV Interpreter

---

## Executive Summary

| Field | Value |
|-------|-------|
| **SV Type** | Deletion / Duplication / Inversion / Translocation |
| **Coordinates** | chr17:44039927-44352659 (GRCh38) |
| **Size** | 313 kb |
| **Gene Content** | 2 genes fully contained, 0 partially disrupted |
| **Classification** | Pathogenic / Likely Pathogenic / VUS / Likely Benign / Benign |
| **Pathogenicity Score** | X.X / 10 |
| **Confidence** | ★★★ / ★★☆ / ★☆☆ |
| **Key Finding** | [One-sentence summary] |

**Clinical Action**: [Required / Recommended / None]

---

## 1. SV Identity & Classification

{SV type, coordinates, size, breakpoint precision, inheritance}

---

## 2. Gene Content Analysis

### 2.1 Fully Contained Genes
{Table of genes with functions, disease associations}

### 2.2 Partially Disrupted Genes
{Genes with breakpoints, domains affected}

### 2.3 Flanking Genes
{Genes near breakpoints, position effect risk}

---

## 3. Dosage Sensitivity Assessment

### 3.1 Haploinsufficient Genes
{ClinGen HI scores, pLI, evidence}

### 3.2 Triplosensitive Genes
{ClinGen TS scores, duplication syndromes}

### 3.3 Non-Dosage-Sensitive Genes
{Genes without established dosage effects}

---

## 4. Population Frequency Context

### 4.1 ClinVar Matches
{Known pathogenic/benign SVs}

### 4.2 gnomAD SV Database
{Population frequencies}

### 4.3 DECIPHER Patient Cases
{Similar SVs, phenotype matching}

---

## 5. Pathogenicity Scoring

### 5.1 Quantitative Assessment
{0-10 score with breakdown}

### 5.2 Score Components
{Gene content, dosage, frequency, clinical}

---

## 6. Literature & Clinical Evidence

### 6.1 Key Publications
{Functional studies, case series}

### 6.2 DECIPHER Cohort Analysis
{Phenotype frequencies, matching}

### 6.3 Functional Evidence
{Gene dosage studies}

---

## 7. ACMG-Adapted Classification

### 7.1 Evidence Codes Applied
{Pathogenic and benign codes with rationale}

### 7.2 Classification
{Final classification with confidence}

### 7.3 Certainty Factors
{Strengths and limitations}

---

## 8. Clinical Recommendations

### 8.1 For Affected Individual
{Testing, management, surveillance}

### 8.2 For Family Members
{Cascade testing, genetic counseling}

### 8.3 Reproductive Considerations
{Recurrence risk, prenatal testing}

---

## 9. Limitations & Uncertainties

{Missing data, conflicting evidence, knowledge gaps}

---

## Data Sources

{All tools and databases queried with results}

Evidence Grading System

Special Scenarios

Scenario 1: Recurrent Microdeletion Syndrome

Additional considerations:

Check for recurrence mechanism (LCRs, NAHR)
Look for founder effects
Population-specific frequencies
Incomplete penetrance
Variable expressivity

Example: 22q11.2 deletion, 17q21.31 deletion (Koolen-De Vries)

Scenario 2: Balanced Translocation (No Gene Disruption)

Assessment approach:

If no genes disrupted: Likely benign (in most cases)
Check for cryptic imbalances
Consider position effects (rare)
Reproductive risk (unbalanced offspring)

Classification: Usually VUS or Likely Benign unless offspring affected

Scenario 3: Complex Rearrangement

Analysis strategy:

Break down into component SVs
Assess each breakpoint independently
Look for chromothripsis pattern
Consider cumulative gene dosage effects
Check for DNA repair defects

Scenario 4: Small In-Frame Deletion/Duplication

Special considerations:

May not cause haploinsufficiency
Check if critical domain affected
Look for similar variants in ClinVar
Consider protein structural impact
May need functional studies

Quantified Minimums

Tools Reference

Core Tools for SV Analysis

Report File Naming

SV_analysis_[TYPE]_chr[CHR]_[START]_[END]_[GENES].md

Examples:
SV_analysis_DEL_chr17_44039927_44352659_KANSL1_MAPT.md
SV_analysis_DUP_chr22_17400000_17800000_TBX1.md
SV_analysis_INV_chr11_2100000_2400000_complex.md

Clinical Recommendations Framework

For Pathogenic/Likely Pathogenic SVs

For VUS

For Benign/Likely Benign

When NOT to Use This Skill

Single nucleotide variants (SNVs) → Use tooluniverse-variant-interpretation skill
Small indels (<50 bp) → Use variant interpretation skill
Somatic variants in cancer → Different framework needed
Mitochondrial variants → Specialized interpretation required
Repeat expansions → Different mechanism

Use this skill for structural variants ≥50 bp requiring dosage sensitivity assessment and ACMG-adapted classification.

Adoption

lamm-mit/structural-variant-analysis

$ install --global

Security Scan Results

SKILL.md

Structural Variant Analysis Workflow

Problem This Skill Solves

Triggers

Workflow Overview

Phase Details

Phase 1: SV Identity & Classification

Phase 2: Gene Content Analysis

Phase 3: Dosage Sensitivity Assessment

Phase 4: Population Frequency Context

Phase 5: Pathogenicity Scoring

Phase 6: Literature & Clinical Evidence

Phase 7: ACMG-Adapted Classification

Pathogenic Evidence Codes

Benign Evidence Codes

Output Structure

Report File: SV_analysis_report.md

Evidence Grading System

Special Scenarios

Scenario 1: Recurrent Microdeletion Syndrome

Scenario 2: Balanced Translocation (No Gene Disruption)

Scenario 3: Complex Rearrangement

Scenario 4: Small In-Frame Deletion/Duplication

Quantified Minimums

Tools Reference

Core Tools for SV Analysis

Report File Naming

Clinical Recommendations Framework

For Pathogenic/Likely Pathogenic SVs

For VUS

For Benign/Likely Benign

When NOT to Use This Skill

See Also

Related Skills

lamm-mit/paperclip

lamm-mit/perplexity-search

lamm-mit/scientific-report-pdf

lamm-mit/python-exec

lamm-mit/structural-variant-analysis

$ install --global

Security Scan Results

SKILL.md

Structural Variant Analysis Workflow

Problem This Skill Solves

Triggers

Workflow Overview

Phase Details

Phase 1: SV Identity & Classification

Phase 2: Gene Content Analysis

Phase 3: Dosage Sensitivity Assessment

Phase 4: Population Frequency Context

Phase 5: Pathogenicity Scoring

Phase 6: Literature & Clinical Evidence

Phase 7: ACMG-Adapted Classification

Pathogenic Evidence Codes

Benign Evidence Codes

Output Structure

Report File: SV_analysis_report.md

Evidence Grading System

Special Scenarios

Scenario 1: Recurrent Microdeletion Syndrome

Scenario 2: Balanced Translocation (No Gene Disruption)

Scenario 3: Complex Rearrangement

Scenario 4: Small In-Frame Deletion/Duplication

Quantified Minimums

Tools Reference

Core Tools for SV Analysis

Report File Naming

Clinical Recommendations Framework

For Pathogenic/Likely Pathogenic SVs

For VUS

For Benign/Likely Benign

When NOT to Use This Skill

See Also

Related Skills

lamm-mit/paperclip

Report File: `SV_analysis_report.md`

Report File: `SV_analysis_report.md`