plugin/skills/tooluniverse-protein-lof-mechanism/SKILL.md
Propose the mechanism by which a missense variant causes loss-of-function (LoF), synthesizing evidence from 5 independent layers: AlphaMissense pathogenicity, AlphaFold structural context, ESMC sequence likelihood, SAE feature disruption, and DynaMut2 stability ΔΔG. Distinguishes 'structural stability LoF' (mis-folding) from 'direct functional disruption' (catalytic / binding / PTM site damage). Use for coding missense variants where you need a mechanistic causal model, not just a pathogenicity score.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-protein-lof-mechanismInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
For a single missense variant, integrate 5 independent computational signals to propose a specific loss-of-function mechanism. Each signal answers a different question:
| Signal | Tool | Answers |
|---|---|---|
| Pathogenicity | AlphaMissense_get_variant_score | "Is this variant damaging?" |
| Structural context | alphafold_get_prediction | "Is the mutation in a folded vs disordered region?" |
| Sequence likelihood | ESM_score_sequence | "Is the substitution evolutionarily plausible?" |
| Feature disruption | ESM_explain_variant_mechanism (or ESM_score_variant_sae_disruption + ESM_describe_sae_feature for raw control) | "Which biological feature breaks?" |
| Stability | DynaMut2_predict_stability | "Does the protein still fold correctly?" |
Apply for missense (coding) variants where:
Not for (use other skills instead):
tooluniverse-variant-to-mechanismtooluniverse-variant-interpretationtooluniverse-protein-sae-variant-interpretationtooluniverse-cancer-variant-interpretation| Input | Format | Example |
|---|---|---|
| Variant ID | {accession}_{ref_aa}{position}{alt_aa} | P04637_R175H |
| Or: UniProt accession + variant string | accession + ref/pos/alt | P04637, R175H |
| Or: gene symbol + variant | HGNC symbol + ref/pos/alt | TP53, R175H |
Parse the variant string:
import re
m = re.match(r"([A-Z])(\d+)([A-Z])", variant_str)
ref_aa, position, alt_aa = m.group(1), int(m.group(2)), m.group(3)
pip install 'esm @ git+https://github.com/evolutionaryscale/esm@ee891c52'If user gave a gene symbol:
UniProt_search(
query="gene:TP53 AND organism_id:9606 AND reviewed:true",
fields=["accession"],
)
# → P04637
Get the canonical sequence:
UniProt_get_sequence_by_accession(accession="P04637")
Validate the reference residue:
assert sequence[position - 1] == ref_aa, "Wrong sequence or wrong isoform"
AlphaMissense_get_variant_score(
uniprot_id="P04637",
position=175,
ref_aa="R",
alt_aa="H",
)
# → returns score 0..1; ≥0.564 is the "likely pathogenic" threshold per the
# AlphaMissense paper. Above 0.9 is very confident damaging.
If AlphaMissense says benign (≤0.34), the rest of the analysis is exploratory — most benign variants don't have a clear LoF mechanism.
alphafold_get_prediction(uniprot_id="P04637")
# → returns structure data including per-residue pLDDT
Check pLDDT at the mutation position:
ESM_score_sequence(
sequence=ref_sequence,
model="esmc-600m-2024-12", # or 300m for cheaper
)
# Then for the mutant sequence:
ESM_score_sequence(
sequence=mutant_sequence,
model="esmc-600m-2024-12",
)
Compute ΔlogP = mean_logP(mutant) − mean_logP(reference) at the position window:
Recommended — one call (composite tool, returns disruption + category labels + summary):
ESM_explain_variant_mechanism(
sequence=ref_sequence,
position=175,
ref_aa="R",
alt_aa="H",
window=8,
top_k_features=5,
)
# data["mechanism_summary"] e.g.:
# "Disrupted feature categories (lost): catalytic=2, ligand-binding=1"
# data["lost_feature_categories"] / data["gained_feature_categories"] give the raw counts
# data["top_features_lost"] / data["top_features_gained"] include per-feature deltas + categories
Lower-level alternative (use only if you need the raw feature_ids before labeling, e.g. to filter to one category before describe calls):
ESM_score_variant_sae_disruption(
sequence=ref_sequence, position=175, ref_aa="R", alt_aa="H",
window=8, top_k_features=10,
)
# Then for each kept feature:
ESM_describe_sae_feature(feature_id=feat["feature_id"])
# → returns category: catalytic | ligand-binding | ptm | domain |
# motif | structural-stability | secondary-structure |
# transmembrane | signal-peptide | propeptide | uncategorized
Dominant category among top lost features = the function most likely disrupted.
DynaMut2 needs a PDB structure. Two options:
Option A — use a PDB ID if available for this protein:
# Look up PDB cross-references from UniProt entry
UniProt_get_entry_by_accession(accession="P04637")
# → check `uniProtKBCrossReferences` for entries with database == "PDB"
# → pick a structure that covers the mutation position
DynaMut2_predict_stability(
pdb_id="2FEJ", # example TP53 DNA-binding domain crystal structure
chain="A",
mutation="R175H",
)
# → returns ddG in kcal/mol
Option B — if no experimental PDB covers the position, use the AlphaFold model (output of Step 2). DynaMut2 accepts AlphaFold PDBs the same way.
Apply the upstream variant_lof_mechanism decision rule:
| Signal pattern | Inferred mechanism | |---|---| | ddG > +1 kcal/mol AND ΔlogP < 0 | Structural stability LoF — mutation destabilizes the fold; protein may misfold / be degraded. Drug rescue strategy: pharmacological chaperones, refolding agents. | | ddG ≈ 0 (in [-0.5, +1]) AND SAE features lost are catalytic | Direct catalytic LoF — protein folds normally but the active site is broken. Strategy: substrate analog / cofactor supplementation. | | ddG ≈ 0 AND SAE features lost are ligand-binding | Binding LoF — fold preserved, binding pocket disrupted. Strategy: small-molecule restoration. | | ddG ≈ 0 AND SAE features lost are PTM | PTM LoF — regulatory site (phospho / glyco / ubiquitin) broken. Mechanism: dysregulation, not direct activity loss. | | ddG ≈ 0 AND SAE features lost are domain / motif | Interface LoF — protein-protein interaction surface affected. Strategy: PPI restoration. | | ddG > 0 + AlphaMissense pathogenic + ΔlogP < 0 but no clear SAE signal | Generic damaging mutation — clearly bad but mechanism unclear. Investigate via experimental assay. |
Before reporting, score the synthesis:
| Confidence | Signal requirement | |---|---| | High | ≥4 signals point the same direction (e.g. AlphaMissense pathogenic + low ΔlogP + ddG > +1 + SAE feature loss) | | Medium | 2-3 signals agree but 1+ are inconclusive | | Low | Signals conflict (e.g. AlphaMissense pathogenic but SAE shows no specific category) — flag for experimental follow-up |
Variant: {VARIANT_ID} e.g. P04637_R175H = TP53 R175H
EVIDENCE LAYERS
1. AlphaMissense: {score:.3f} ({pathogenic|ambiguous|benign})
2. AlphaFold pLDDT: {plddt:.1f} ({well-folded|flexible|disordered})
3. ESMC ΔlogP: {dlogp:+.3f} ({implausible|tolerated})
4. SAE feature loss: top 3 features lost, dominant category = {category}
Feature {f1.id}: Δ={f1.delta:+.3f}, category={cat1}
Feature {f2.id}: Δ={f2.delta:+.3f}, category={cat2}
Feature {f3.id}: Δ={f3.delta:+.3f}, category={cat3}
5. DynaMut2 ΔΔG: {ddg:+.2f} kcal/mol ({destabilizing|neutral|stabilizing})
PROPOSED MECHANISM: {one of the 6 categories from Step 6}
SUPPORTING LOGIC: {one paragraph synthesizing the signals}
CONFIDENCE: {high|medium|low}
LIMITATIONS: {any signals that conflicted, missing data, low pLDDT, etc.}
ESM_describe_sae_feature labels are best-effort aggregations from a 10-protein panel; some features stay "uncategorized" with high activation values — flag this as low-confidence interpretation.DynaMut2 is the recommended default — it's already wired into TU via a hosted academic API (BioSig lab, UQ Australia) and requires zero extra setup. But if you need the higher accuracy of the newer ThermoMPNN model (Dieckhaus et al., PNAS 2024), here are your options:
git clone https://github.com/Kuhlman-Lab/ThermoMPNN.git
cd ThermoMPNN
# Install conda env from environment.yaml — note the GitHub README warns
# the .yaml may install CPU-only PyTorch by mistake; verify GPU PyTorch
conda env create -f environment.yaml
conda activate ThermoMPNN
# The checkpoint thermoMPNN_default.pt ships in models/
python custom_inference.py --pdb <your.pdb> --mutation <e.g. R175H>
Requirements: NVIDIA GPU with CUDA 11.8, ~15 min one-time setup.
License: MIT (no commercial restrictions).
Citation: Dieckhaus et al. (2024). Transfer learning to leverage larger datasets for improved prediction of protein stability changes. PNAS 121(6):e2314853121. doi:10.1073/pnas.2314853121.
Several commercial platforms host ThermoMPNN with free tiers (specific quota varies and is not always documented):
app.tamarind.bio/api/, x-api-key authPOST /api/v3/thermompnn/predict/, Token authThese are commercial SaaS — "free tier" usually means "try a few calls then pay". Not as clean as DynaMut2's purely academic endpoint, which is why TU defaults to DynaMut2.
For LoF mechanism classification (this skill's use case), the binary distinction ddG > +1 vs ddG ≈ 0 is what drives the synthesis. Both models give this signal correctly for clear-cut cases. ThermoMPNN's edge over DynaMut2:
If your work depends on any of those three, switch to ThermoMPNN. Otherwise DynaMut2 is sufficient for the LoF mechanism decision in this skill.
tools
Post-market safety surveillance and recall/adverse-event RETRIEVAL across the full spectrum of FDA-regulated products that are NOT covered by the drug-AE signal skills: medical devices, food / dietary supplements / cosmetics, veterinary drugs, and drug supply (shortages). Orchestrates openFDA endpoints (MAUDE device adverse events + device recalls + 510(k), CAERS food/supplement/ cosmetic adverse events, veterinary adverse events, drug shortages, and cross-product enforcement/recall reports). USE WHEN the user asks: "are there adverse events for [device / pacemaker / infusion pump / insulin pump]", "device recalls for [firm/product]", "supplement / vitamin / cosmetic adverse reactions", "is [drug] in shortage", "what injectables are on shortage", "veterinary / animal adverse events for [drug] in [dog/cat/horse]", "food recall for listeria", "MAUDE report for [device]", "CAERS reactions for [brand]". DO NOT USE for drug adverse-event SIGNAL detection or disproportionality (PRR / ROR / IC) or drug-AE association scoring — that is `tooluniverse-pharmacovigilance` / `tooluniverse-adverse-event-detection`. This skill is multi-product surveillance and retrieval, not drug-AE statistical signal mining.
tools
--- name: tooluniverse-phewas description: Cross-ancestry / cross-biobank phenome-wide association (PheWAS) and replication. Given ONE variant (rsID) or ONE gene, look up every phenotype it associates with across European/UK (UKB-TOPMed), Finnish (FinnGen), Japanese (BioBank Japan), and Taiwanese (TPMI) biobanks, plus exome-wide gene-burden PheWAS (Genebass), then judge whether an association replicates across ancestries or is population-specific. Use whenever the user asks "what else is this va
tools
Dereplicate a putative natural product and assign its chemical taxonomy. Use to answer "is [compound] a known natural product", "what microbe/organism produces [compound]", "what chemical class is [compound]", "dereplicate this metabolite (by formula/exact mass/InChIKey/SMILES)", or "classify this molecule into ChemOnt". Searches NPAtlas for known microbial natural products (producing organism + literature reference), assigns the ChemOnt kingdom→superclass→class→subclass hierarchy via ClassyFire, resolves systematic IUPAC names to structure via OPSIN, and cross-references identity in PubChem. NOT for general drug/compound identity or ADMET (use tooluniverse-chemical-compound-retrieval / tooluniverse-small-molecule-discovery) and NOT for metabolomics pathway/enrichment analysis (use tooluniverse-metabolomics skills).
tools
Genome-ASSEMBLY discovery, QC, and replicon mapping for any organism (bacteria, archaea, fungi, and beyond) using NCBI Datasets. Resolves an organism name or taxid to assemblies, picks the reference/representative or best-quality assembly, pulls assembly QC metrics (total length, contig/scaffold N50, contig count, GC%, assembly level, RefSeq category), enumerates chromosomes and plasmids via per-replicon sequence reports, and compares candidate assemblies on quality. Use for "what genomes are available for [organism]", "assembly stats / N50 / GC content for [GCF_/GCA_ accession]", "how many plasmids does [strain] have", "compare assemblies for [species]", "find the reference genome for [taxon]", "is this assembly Complete Genome or just contigs". NOT for gene-level orthology/synteny (use tooluniverse-comparative-genomics), plant gene structure (use tooluniverse-plant-genomics), de novo assembly from raw reads (no tool exists), or taxonomy-only name/lineage lookups.