skills/tooluniverse-protein-lof-mechanism/SKILL.md
Propose the mechanism by which a missense variant causes loss-of-function (LoF), synthesizing evidence from 5 independent layers: AlphaMissense pathogenicity, AlphaFold structural context, ESMC sequence likelihood, SAE feature disruption, and DynaMut2 stability ΔΔG. Distinguishes 'structural stability LoF' (mis-folding) from 'direct functional disruption' (catalytic / binding / PTM site damage). Use for coding missense variants where you need a mechanistic causal model, not just a pathogenicity score.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-protein-lof-mechanismInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
For a single missense variant, integrate 5 independent computational signals to propose a specific loss-of-function mechanism. Each signal answers a different question:
| Signal | Tool | Answers |
|---|---|---|
| Pathogenicity | AlphaMissense_get_variant_score | "Is this variant damaging?" |
| Structural context | alphafold_get_prediction | "Is the mutation in a folded vs disordered region?" |
| Sequence likelihood | ESM_score_sequence | "Is the substitution evolutionarily plausible?" |
| Feature disruption | ESM_explain_variant_mechanism (or ESM_score_variant_sae_disruption + ESM_describe_sae_feature for raw control) | "Which biological feature breaks?" |
| Stability | DynaMut2_predict_stability | "Does the protein still fold correctly?" |
Apply for missense (coding) variants where:
Not for (use other skills instead):
tooluniverse-variant-to-mechanismtooluniverse-variant-interpretationtooluniverse-protein-sae-variant-interpretationtooluniverse-cancer-variant-interpretation| Input | Format | Example |
|---|---|---|
| Variant ID | {accession}_{ref_aa}{position}{alt_aa} | P04637_R175H |
| Or: UniProt accession + variant string | accession + ref/pos/alt | P04637, R175H |
| Or: gene symbol + variant | HGNC symbol + ref/pos/alt | TP53, R175H |
Parse the variant string:
import re
m = re.match(r"([A-Z])(\d+)([A-Z])", variant_str)
ref_aa, position, alt_aa = m.group(1), int(m.group(2)), m.group(3)
pip install 'esm @ git+https://github.com/evolutionaryscale/esm@ee891c52'If user gave a gene symbol:
UniProt_search(
query="gene:TP53 AND organism_id:9606 AND reviewed:true",
fields=["accession"],
)
# → P04637
Get the canonical sequence:
UniProt_get_sequence_by_accession(accession="P04637")
Validate the reference residue:
assert sequence[position - 1] == ref_aa, "Wrong sequence or wrong isoform"
AlphaMissense_get_variant_score(
uniprot_id="P04637",
position=175,
ref_aa="R",
alt_aa="H",
)
# → returns score 0..1; ≥0.564 is the "likely pathogenic" threshold per the
# AlphaMissense paper. Above 0.9 is very confident damaging.
If AlphaMissense says benign (≤0.34), the rest of the analysis is exploratory — most benign variants don't have a clear LoF mechanism.
alphafold_get_prediction(uniprot_id="P04637")
# → returns structure data including per-residue pLDDT
Check pLDDT at the mutation position:
ESM_score_sequence(
sequence=ref_sequence,
model="esmc-600m-2024-12", # or 300m for cheaper
)
# Then for the mutant sequence:
ESM_score_sequence(
sequence=mutant_sequence,
model="esmc-600m-2024-12",
)
Compute ΔlogP = mean_logP(mutant) − mean_logP(reference) at the position window:
Recommended — one call (composite tool, returns disruption + category labels + summary):
ESM_explain_variant_mechanism(
sequence=ref_sequence,
position=175,
ref_aa="R",
alt_aa="H",
window=8,
top_k_features=5,
)
# data["mechanism_summary"] e.g.:
# "Disrupted feature categories (lost): catalytic=2, ligand-binding=1"
# data["lost_feature_categories"] / data["gained_feature_categories"] give the raw counts
# data["top_features_lost"] / data["top_features_gained"] include per-feature deltas + categories
Lower-level alternative (use only if you need the raw feature_ids before labeling, e.g. to filter to one category before describe calls):
ESM_score_variant_sae_disruption(
sequence=ref_sequence, position=175, ref_aa="R", alt_aa="H",
window=8, top_k_features=10,
)
# Then for each kept feature:
ESM_describe_sae_feature(feature_id=feat["feature_id"])
# → returns category: catalytic | ligand-binding | ptm | domain |
# motif | structural-stability | secondary-structure |
# transmembrane | signal-peptide | propeptide | uncategorized
Dominant category among top lost features = the function most likely disrupted.
DynaMut2 needs a PDB structure. Two options:
Option A — use a PDB ID if available for this protein:
# Look up PDB cross-references from UniProt entry
UniProt_get_entry_by_accession(accession="P04637")
# → check `uniProtKBCrossReferences` for entries with database == "PDB"
# → pick a structure that covers the mutation position
DynaMut2_predict_stability(
pdb_id="2FEJ", # example TP53 DNA-binding domain crystal structure
chain="A",
mutation="R175H",
)
# → returns ddG in kcal/mol
Option B — if no experimental PDB covers the position, use the AlphaFold model (output of Step 2). DynaMut2 accepts AlphaFold PDBs the same way.
Apply the upstream variant_lof_mechanism decision rule:
| Signal pattern | Inferred mechanism | |---|---| | ddG > +1 kcal/mol AND ΔlogP < 0 | Structural stability LoF — mutation destabilizes the fold; protein may misfold / be degraded. Drug rescue strategy: pharmacological chaperones, refolding agents. | | ddG ≈ 0 (in [-0.5, +1]) AND SAE features lost are catalytic | Direct catalytic LoF — protein folds normally but the active site is broken. Strategy: substrate analog / cofactor supplementation. | | ddG ≈ 0 AND SAE features lost are ligand-binding | Binding LoF — fold preserved, binding pocket disrupted. Strategy: small-molecule restoration. | | ddG ≈ 0 AND SAE features lost are PTM | PTM LoF — regulatory site (phospho / glyco / ubiquitin) broken. Mechanism: dysregulation, not direct activity loss. | | ddG ≈ 0 AND SAE features lost are domain / motif | Interface LoF — protein-protein interaction surface affected. Strategy: PPI restoration. | | ddG > 0 + AlphaMissense pathogenic + ΔlogP < 0 but no clear SAE signal | Generic damaging mutation — clearly bad but mechanism unclear. Investigate via experimental assay. |
Before reporting, score the synthesis:
| Confidence | Signal requirement | |---|---| | High | ≥4 signals point the same direction (e.g. AlphaMissense pathogenic + low ΔlogP + ddG > +1 + SAE feature loss) | | Medium | 2-3 signals agree but 1+ are inconclusive | | Low | Signals conflict (e.g. AlphaMissense pathogenic but SAE shows no specific category) — flag for experimental follow-up |
Variant: {VARIANT_ID} e.g. P04637_R175H = TP53 R175H
EVIDENCE LAYERS
1. AlphaMissense: {score:.3f} ({pathogenic|ambiguous|benign})
2. AlphaFold pLDDT: {plddt:.1f} ({well-folded|flexible|disordered})
3. ESMC ΔlogP: {dlogp:+.3f} ({implausible|tolerated})
4. SAE feature loss: top 3 features lost, dominant category = {category}
Feature {f1.id}: Δ={f1.delta:+.3f}, category={cat1}
Feature {f2.id}: Δ={f2.delta:+.3f}, category={cat2}
Feature {f3.id}: Δ={f3.delta:+.3f}, category={cat3}
5. DynaMut2 ΔΔG: {ddg:+.2f} kcal/mol ({destabilizing|neutral|stabilizing})
PROPOSED MECHANISM: {one of the 6 categories from Step 6}
SUPPORTING LOGIC: {one paragraph synthesizing the signals}
CONFIDENCE: {high|medium|low}
LIMITATIONS: {any signals that conflicted, missing data, low pLDDT, etc.}
ESM_describe_sae_feature labels are best-effort aggregations from a 10-protein panel; some features stay "uncategorized" with high activation values — flag this as low-confidence interpretation.DynaMut2 is the recommended default — it's already wired into TU via a hosted academic API (BioSig lab, UQ Australia) and requires zero extra setup. But if you need the higher accuracy of the newer ThermoMPNN model (Dieckhaus et al., PNAS 2024), here are your options:
git clone https://github.com/Kuhlman-Lab/ThermoMPNN.git
cd ThermoMPNN
# Install conda env from environment.yaml — note the GitHub README warns
# the .yaml may install CPU-only PyTorch by mistake; verify GPU PyTorch
conda env create -f environment.yaml
conda activate ThermoMPNN
# The checkpoint thermoMPNN_default.pt ships in models/
python custom_inference.py --pdb <your.pdb> --mutation <e.g. R175H>
Requirements: NVIDIA GPU with CUDA 11.8, ~15 min one-time setup.
License: MIT (no commercial restrictions).
Citation: Dieckhaus et al. (2024). Transfer learning to leverage larger datasets for improved prediction of protein stability changes. PNAS 121(6):e2314853121. doi:10.1073/pnas.2314853121.
Several commercial platforms host ThermoMPNN with free tiers (specific quota varies and is not always documented):
app.tamarind.bio/api/, x-api-key authPOST /api/v3/thermompnn/predict/, Token authThese are commercial SaaS — "free tier" usually means "try a few calls then pay". Not as clean as DynaMut2's purely academic endpoint, which is why TU defaults to DynaMut2.
For LoF mechanism classification (this skill's use case), the binary distinction ddG > +1 vs ddG ≈ 0 is what drives the synthesis. Both models give this signal correctly for clear-cut cases. ThermoMPNN's edge over DynaMut2:
If your work depends on any of those three, switch to ThermoMPNN. Otherwise DynaMut2 is sufficient for the LoF mechanism decision in this skill.
tools
PCR / qPCR primer and oligo design — design forward/reverse primers for a target region (SantaLucia nearest-neighbor thermodynamics), compute melting temperature (Tm) and annealing temperature (Ta), check GC content, and screen an oligo for hairpins and primer-dimers. Use when you need primers for a sequence, want to QC an existing primer pair, or need the Tm of an oligo. Covers the primer-design rules (Tm matching, GC clamp, 3'-end, length) and the tools' constraint quirks.
tools
Pharmacokinetic (PK) analysis of concentration-time data — non-compartmental analysis (NCA) for Cmax, Tmax, AUC (0-t and 0-∞), terminal half-life, clearance (CL), volume of distribution (Vd), MRT, and absolute bioavailability (F). Also one-compartment fitting. Use when you have plasma/serum drug concentrations over time after a dose and need PK parameters, or to compute bioavailability from IV + oral AUCs. NOT for ADMET property prediction from structure (use tooluniverse-admet-prediction).
tools
Molecular cloning assembly design — Gibson Assembly (overlap design for seamless multi-fragment joining) and Golden Gate Assembly (Type IIS / BsaI / BbsI design with unique 4-bp fusion overhangs). Use when you need to plan how to join DNA fragments into a construct, design assembly overlaps/overhangs, or decide between cloning methods. Covers the domestication (internal-site removal), overhang-uniqueness, and overlap-Tm rules. For PCR primers to generate the fragments, see tooluniverse-primer-design.
tools
Meta-analysis / evidence synthesis — pool effect sizes across studies (odds ratios, risk ratios, hazard ratios, mean differences, correlations, GWAS betas) with fixed- or random-effects models, quantify heterogeneity (Q, I², τ²), and build a forest plot. Use when you have results from MULTIPLE studies and need a single pooled estimate, or to synthesize evidence from a systematic review / multiple GWAS / replicated experiments. Handles the error-prone effect-size + standard-error preparation (converting OR/HR/CI, two-group means±SD, proportions, and correlations into the (effect, SE) the pooling step needs).