plugin/skills/tooluniverse-residue-functional-mechanism-interpretation/SKILL.md
Given a set of residues in a protein, explain WHY they are functionally critical by combining structural context (binding interface, ligand pocket, core, secondary structure), UniProt features (active sites, binding sites, PTM sites, disulfides), optional SAE feature evidence, and optional DMS data. Accepts residues from any source: DMS hotspots (top-K by max effect), ClinVar recurrent variants, literature-reported hot regions, evolutionarily conserved positions, or user-curated lists. Returns a per-cluster mechanism call: catalytic / ligand-binding / interface / structural-core / PTM / regulatory / unknown.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-residue-functional-mechanism-interpretationInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
The core user question: "Why are these residues functionally critical?"
Answering needs more than one source: a residue in the ligand pocket means something different from one in a catalytic triad, an interface, the hydrophobic core, or a PTM sequon. This skill synthesizes evidence from multiple TU tools to call a mechanism for each residue (or cluster of adjacent residues).
The residues can come from any source — this skill is agnostic to where they came from:
| Residue source | Typical call pattern |
|---|---|
| DMS map hotspots | Use Step 1 (optional) to detect top-K by max effect, then continue |
| ClinVar recurrent variants | Pull recurrent positions from ClinVar, pass directly as user_provided_positions |
| Literature hot regions | Paste positions from a paper's Fig 1, pass directly |
| Evolutionarily conserved residues | Filter by conservation score, pass top-N positions |
| Druggable site residues | From a binding-site predictor, pass directly |
| Clinician's question | "Why does mutation at R175 keep showing up in tumors?" — pass [175] |
Not for:
tooluniverse-variant-predictor-dms-validationtooluniverse-protein-sae-variant-interpretationtooluniverse-protein-lof-mechanismThe single-variant skills focus on one mutation's signature; this skill focuses on which residues matter and why.
Path A — Residues supplied directly (covers ClinVar / literature / custom positions):
| Input | Notes |
|---|---|
| user_provided_positions: List[int] | 1-based canonical residue positions |
| Protein metadata | UniProt accession + PDB ID + chain |
| (optional) DMS matrix | If supplied, used to enrich each cluster with effect-size context |
| (optional) SAE tensor | If supplied, gives SAE feature evidence as a 4th layer |
Path B — Detect hotspots from a DMS map (original use case):
| Input | Source | Notes |
|---|---|---|
| DMS effect matrix (20, n_positions) | MaveDB_get_effect_matrix | NaN for unmeasured |
| disruptive_tail | DMS retrieval metadata | "top" or "bottom" |
| Protein metadata | UniProt accession + PDB ID + chain | for the multi-evidence lookups |
| (optional) SAE evidence | ESM_get_region_sae_features for one contiguous cluster (1 Forge call), OR a precomputed full DMS SAE tensor from ESM_get_sae_features per mutant | the SAE evidence layer; see Step 4 for which path |
In Path B the skill runs Step 1 to detect hotspots; in Path A it skips Step 1 entirely and goes straight to Step 2 (gather evidence).
If the user says "explain why residue/cluster X is a hotspot", do NOT take that as a given. Verify it's actually a hotspot in THIS DMS first — users import biological knowledge from other contexts that may not match what the specific assay measured.
# Per-position disruption magnitude (same formula as Step 1)
if disruptive_tail == "top":
dms_per_pos = np.nanmax(dms_matrix, axis=0)
elif disruptive_tail == "bottom":
dms_per_pos = -np.nanmin(dms_matrix, axis=0)
# Where do the user-named positions actually rank?
ranks = (-dms_per_pos).argsort().argsort() # 0 = highest
for user_pos in user_named_positions:
rank = int(ranks[pos_index[user_pos]])
pct = 100 * (1 - rank / len(dms_per_pos))
print(f" pos {user_pos}: rank {rank+1}/{len(dms_per_pos)}, "
f"top {pct:.0f}% by max effect")
Decision rule:
Concrete example: the user asks "why is KRAS G12/G13 a folding hotspot in this DMS?" The data say G12 ranks 105/187 (top 56%) and G13 ranks 124/187 (top 66%) by max ΔΔG — they are NOT folding hotspots in this AbundancePCA assay (even though they ARE famous oncogenic positions). The skill's job is to surface that contradiction up front, then proceed with a mechanism analysis for those residues (their oncogenic effect is via GTPase abolishment, not fold disruption — and that's a genuinely useful answer to the user's actual scientific question, just not the one they literally asked).
Skip this step entirely if the user already provided positions (Path A). In Path A the residue list IS the input; jump straight to clustering at the bottom of this section.
import numpy as np
if user_provided_positions:
# Path A — residues from any source (ClinVar / literature / custom)
positions_to_analyze = sorted(set(user_provided_positions))
else:
# Path B — detect from DMS matrix
if disruptive_tail == "top":
dms_per_pos = np.nanmax(dms_matrix, axis=0) # max destabilization at any allele
elif disruptive_tail == "bottom":
dms_per_pos = -np.nanmin(dms_matrix, axis=0) # flip for low-is-bad assays
K = 20
positions_to_analyze = sorted(np.argsort(-dms_per_pos)[:K].tolist())
# Chain adjacent positions (gap ≤ 2) into clusters — shared by both paths
clusters = []
current = [positions_to_analyze[0]]
for p in positions_to_analyze[1:]:
if p - current[-1] <= 2:
current.append(p)
else:
clusters.append(current)
current = [p]
clusters.append(current)
print(f"{len(clusters)} cluster(s): {clusters}")
A "cluster" of one is allowed — it just gets less statistical power in the permutation test, but multi-evidence interpretation still works.
Path A workflow contracts (what's available vs not):
| Evidence layer | Path A (user residues) | Path B (DMS hotspots) |
|---|---|---|
| Structural (Step 2) | ✓ always | ✓ always |
| UniProt features (Step 3) | ✓ always | ✓ always |
| SAE per-feature labels (Step 4, descriptive) | ✓ if SAE tensor supplied | ✓ if SAE tensor supplied |
| SAE permutation test (Step 4-alt) | ✗ — needs DMS-derived max_drop baseline; not meaningful for residues with no DMS context | ✓ if SAE tensor supplied |
| DMS effect-size context | ✓ if DMS matrix supplied (enrichment only) | ✓ always |
| Mechanism synthesis (Step 5) | ✓ always | ✓ always |
Annotate the protein structure once, then read fields for each cluster's positions:
# One-shot structural annotation (cached for the rest of the skill)
struct = Structure_annotate_per_residue(
pdb_id="6VJJ", # pick a structure with the relevant complex
target_chain="A",
partner_chains=["B"], # if there's a binding partner
ligand_resnames=["GNP", "MG"], # if there's a relevant ligand
distance_cutoff=5.0,
include_secondary_structure=True,
)
by_pos = {r["position"]: r for r in struct["data"]["annotations"]}
for cluster in clusters:
structural_summary = {
"interface_count": sum(1 for p in cluster if by_pos.get(p, {}).get("region") in ("interface", "both")),
"ligand_pocket_count": sum(1 for p in cluster if by_pos.get(p, {}).get("region") in ("ligand", "both")),
"core_count": sum(1 for p in cluster if by_pos.get(p, {}).get("is_core")),
"ss_elements": [by_pos.get(p, {}).get("ss_element") for p in cluster],
}
# UniProt features per position (active sites, binding sites, PTM sites, etc.)
up = UniProt_get_function_by_accession(accession="P01116")
# Parse up["features"] into a per-position lookup
features_by_pos = {}
for feat in up.get("features", []):
start = int(feat.get("begin", feat.get("position", -1)))
end = int(feat.get("end", start))
for p in range(start, end + 1):
features_by_pos.setdefault(p, []).append({
"type": feat.get("type"),
"description": feat.get("description", ""),
})
for cluster in clusters:
uniprot_summary = {
"active_site": [p for p in cluster if any(f["type"] == "Active site" for f in features_by_pos.get(p, []))],
"binding_site": [p for p in cluster if any(f["type"] == "Binding site" for f in features_by_pos.get(p, []))],
"modified_residue": [p for p in cluster if any(f["type"] == "Modified residue" for f in features_by_pos.get(p, []))],
"disulfide_bond": [p for p in cluster if any(f["type"] == "Disulfide bond" for f in features_by_pos.get(p, []))],
"domain": [f.get("description") for p in cluster for f in features_by_pos.get(p, []) if f.get("type") == "Domain"],
}
Two paths depending on whether you already have a full DMS SAE tensor:
Path A — no precomputed tensor (most cases): use the region tool directly.
If the cluster is a contiguous range (or you can pad to one), ESM_get_region_sae_features aggregates SAE features over the range in a single Forge call:
region = ESM_get_region_sae_features(
sequence=ref_sequence,
start_position=min(cluster_positions),
end_position=max(cluster_positions),
top_k_features=5,
)
top_features = [f["feature_id"] for f in region["data"]["top_features"]]
This is the right default — 1 Forge call vs 20 × cluster-size for the DMS-tensor path. For non-contiguous clusters, run once per contiguous sub-range and union the top-K.
Path B — you already have a precomputed SAE tensor from a DMS sweep (e.g. the variant-predictor-dms-validation pipeline left one on disk): compute drops directly without re-calling Forge.
def cluster_sae_features(sae_tensor, wt_vec, cluster_positions, top_n=5):
"""Returns top SAE features by mean drop at cluster, ready for labeling."""
drops = np.maximum(0.0, wt_vec[None, :, :] - sae_tensor)
max_drop_per_pos = np.nanmax(drops, axis=0) # (n_pos, 16384)
cluster_mean = max_drop_per_pos[cluster_positions].mean(axis=0)
return np.argsort(-cluster_mean)[:top_n].tolist()
top_features = cluster_sae_features(sae_tensor, wt_vec, cluster_cols, top_n=5)
Label each top feature via the SAE feature labeler:
for f in top_features:
label = ESM_describe_sae_feature(feature_id=int(f), n_proteins=5)
print(f" feature {f}: {label['data'].get('category')} (conf {label['data'].get('confidence')})")
The first label call for each feature is slow (~30s, ~10 Forge credits as the labeler runs SAE on a 10-protein panel); subsequent calls hit cache.
def permutation_pvalues(cluster_positions, max_drop, n_perm=10000, rng=None):
"""Per-feature: is the cluster's mean drop > random equally-sized set?"""
rng = rng or np.random.default_rng(0)
n_positions = max_drop.shape[0]
cluster_size = len(cluster_positions)
observed = max_drop[cluster_positions].mean(axis=0)
null_geq = np.zeros(max_drop.shape[1], dtype=np.int32)
for _ in range(n_perm):
idx = rng.choice(n_positions, size=cluster_size, replace=False)
null_geq += (max_drop[idx].mean(axis=0) >= observed).astype(np.int32)
return (null_geq + 1) / (n_perm + 1)
from statsmodels.stats.multitest import multipletests
p_raw = permutation_pvalues(np.array(cluster_cols), max_drop_per_pos)
_, p_adj, _, _ = multipletests(p_raw, method="fdr_bh")
significant_features = np.where(p_adj < 0.05)[0].tolist()
Use the mean of max_drop, not the max — under this null a maximum-based
statistic returns almost no significant features. Single-position clusters
return no significant features (no statistical power) — fall back to Step 4
descriptive ranking.
For each cluster, combine the 3 (or 4 with SAE) evidence streams:
def call_mechanism(structural, uniprot, sae_labels=None):
"""Return one of: catalytic | ligand-binding | interface |
structural-core | PTM | regulatory | mixed | unknown."""
# Direct UniProt evidence wins
if uniprot["active_site"]:
return "catalytic"
if uniprot["binding_site"]:
return "ligand-binding"
if uniprot["modified_residue"] and len(uniprot["modified_residue"]) >= len(cluster) // 2:
return "PTM"
# Structural evidence
if structural["ligand_pocket_count"] >= len(cluster) // 2:
return "ligand-binding"
if structural["interface_count"] >= len(cluster) // 2:
return "interface"
if structural["core_count"] >= len(cluster) // 2:
return "structural-core"
# SAE evidence as tiebreaker
if sae_labels:
from collections import Counter
cat_counts = Counter(l for l in sae_labels if l)
if cat_counts:
top_cat, n = cat_counts.most_common(1)[0]
if n >= 2: # at least 2 of top-5 SAE features agree
return top_cat # e.g. "ligand-binding"
return "unknown"
Cluster 1 — positions [12, 13]
Structural: 0/2 interface, 2/2 ligand pocket (GTP), 0/2 core, all in P-loop helix
UniProt: Binding site (GTP) at residues 12, 13
Domain: small GTPase
SAE top 5: ligand-binding (×2), secondary-structure (×3)
→ MECHANISM: ligand-binding (GTP P-loop)
Cluster 2 — positions [40, 41]
Structural: 2/2 interface (chain B = RAF1-RBD)
UniProt: no specific annotation
SAE top 5: structural-stability (×3), domain (×2)
→ MECHANISM: interface (KRAS-RAF1 binding)
The publication-style figure: DMS effect heatmap, sequence strip, structural annotation track, with per-hotspot mechanism callouts above the heatmap.
Align everything to one position axis. Heatmap column p, sequence
letter p, every annotation bar covering residue p — all share x = p.
Verify a landmark before drawing:
landmark_col = positions.index(12) # column for KRAS pos 12
assert sequence[landmark_col] == "G", f"alignment broken at col {landmark_col}"
A 1-2 residue misalignment between heatmap and annotation track is a common, visually subtle error. If you've cross-joined two coordinate systems and any join was off-by-N, the whole figure is silently wrong. Verify here.
Heatmap + sequence + annotation track + callouts:
import matplotlib.pyplot as plt
import numpy as np
vlim = max(abs(np.nanmin(dms_matrix)), abs(np.nanmax(dms_matrix)))
fig, axes = plt.subplots(
nrows=4, ncols=1, figsize=(max(8, 0.15 * len(positions)), 6),
gridspec_kw={"height_ratios": [0.5, 4, 0.3, 0.5]}, sharex=True,
)
ax_callouts, ax_heat, ax_seq, ax_anno = axes
# Heatmap — symmetric diverging (RdBu_r); center on 0 for ΔΔG-style data
im = ax_heat.imshow(
dms_matrix, aspect="auto", cmap="RdBu_r",
vmin=-vlim, vmax=vlim,
extent=(0, len(positions), 20, 0),
)
ax_heat.set_yticks(np.arange(20) + 0.5)
ax_heat.set_yticklabels(list(amino_acid_order))
ax_heat.set_ylabel("Substitution")
# Mark WT cells (box, no fill) — distinguish "WT" from "not measured"
for col, p in enumerate(positions):
wt_aa = sequence[col]
if wt_aa in amino_acid_order:
row = amino_acid_order.index(wt_aa)
ax_heat.add_patch(plt.Rectangle(
(col, row), 1, 1, fill=False, edgecolor='black', linewidth=0.5,
))
# Sequence strip — one monospace letter per column
ax_seq.set_xlim(0, len(positions))
ax_seq.set_ylim(0, 1)
ax_seq.set_yticks([])
for col, letter in enumerate(sequence):
ax_seq.text(col + 0.5, 0.5, letter, ha="center", va="center",
family="monospace", fontsize=8)
# Annotation track — region colors (top half) + core bar (bottom half)
anno_by_pos = {a["position"]: a for a in struct["data"]["annotations"]}
region_colors = {"interface": "#1f77b4", "ligand": "#ff7f0e",
"both": "#2ca02c", "other": "#cccccc"}
for col, p in enumerate(positions):
a = anno_by_pos.get(p, {})
ax_anno.add_patch(plt.Rectangle(
(col, 0.5), 1, 0.5, facecolor=region_colors.get(a.get("region", "other"), "#cccccc"),
))
if a.get("is_core"):
ax_anno.add_patch(plt.Rectangle((col, 0.0), 1, 0.5, facecolor="black"))
ax_anno.set_xlim(0, len(positions))
ax_anno.set_ylim(0, 1)
ax_anno.set_yticks([0.25, 0.75])
ax_anno.set_yticklabels(["core", "region"])
ax_anno.set_xlabel("Residue position")
# Callout row — per-hotspot mechanism boxes linked to clusters by brackets
for cluster, mechanism, top_features in hotspot_results:
cluster_cols = [positions.index(p) for p in cluster if p in positions]
if not cluster_cols:
continue
c_left, c_right = min(cluster_cols), max(cluster_cols)
center = (c_left + c_right) / 2
ax_heat.plot([c_left, c_right + 1], [0, 0], "k-", lw=2)
label_lines = [f"MECHANISM: {mechanism}"] + [f" {fl}" for fl in top_features[:3]]
ax_callouts.text(center, 0.5, "\n".join(label_lines),
ha="center", va="center", fontsize=7,
bbox=dict(facecolor="white", edgecolor="black"))
ax_callouts.plot([center, center], [0, -0.3], "k-", lw=0.5)
ax_callouts.set_xlim(0, len(positions))
ax_callouts.set_ylim(0, 1)
ax_callouts.axis("off")
fig.colorbar(im, ax=ax_heat, label="DMS effect (ΔΔG kcal/mol)")
plt.savefig("dms_hotspots_annotated.png", dpi=200, bbox_inches="tight")
Three cell-color rules to get right:
Long proteins: for >300 residues, split into multiple horizontal panels (one panel per domain) rather than shrinking column width — the per-residue detail disappears below ~3 pixels per column.
Reproducing a published panel: verify its track alignment before
treating it as ground truth. Published DMS panels do carry registration
errors (the KRAS Fig 1i in the original paper is shifted +2 relative to its
own sequence — see tooluniverse-protein-structural-annotation-pdb pitfalls).
| Mechanism | Implication | |---|---| | catalytic | Direct enzyme function — mutations abolish activity | | ligand-binding | Substrate / cofactor / ion / nucleotide binding — mutations alter substrate specificity or affinity | | interface | Protein-protein interaction surface — mutations may disrupt complex formation (consider PPI inhibitor design) | | structural-core | Fold stability — mutations destabilize protein (consider rescuing with chaperones; harder to drug) | | PTM | Regulation site (phospho, acetyl, ubiquitin, glycosylation) — mutations alter signaling rather than activity | | regulatory | Allosteric site / autoinhibitory residue — mutations bias conformational equilibrium | | mixed | Multiple evidence types disagree — needs case-by-case analysis | | unknown | No mechanism could be assigned — possibly novel function or wrong reference structure |
| Tool / Skill | Role |
|---|---|
| MaveDB_get_effect_matrix | DMS matrix input |
| tooluniverse-protein-structural-annotation-pdb (or Structure_annotate_per_residue directly) | Structural evidence |
| UniProt_get_function_by_accession | UniProt features (active sites, binding sites, PTMs, disulfides) |
| ESM_get_region_sae_features | Step 4 Path A — aggregate SAE features over a contiguous cluster in 1 Forge call (preferred) |
| ESM_get_sae_features | Step 4 Path B — only if you already have a precomputed full DMS SAE tensor |
| ESM_describe_sae_feature | Label SAE features in Step 4 |
| tooluniverse-variant-predictor-dms-validation | Sibling skill: validate a predictor before trusting its scores |
| (heatmap visualization is now Step 7 of this skill) | annotated DMS panel with per-hotspot callouts |
| alphafold_get_prediction | pLDDT context if no experimental PDB available |
tools
Post-market safety surveillance and recall/adverse-event RETRIEVAL across the full spectrum of FDA-regulated products that are NOT covered by the drug-AE signal skills: medical devices, food / dietary supplements / cosmetics, veterinary drugs, and drug supply (shortages). Orchestrates openFDA endpoints (MAUDE device adverse events + device recalls + 510(k), CAERS food/supplement/ cosmetic adverse events, veterinary adverse events, drug shortages, and cross-product enforcement/recall reports). USE WHEN the user asks: "are there adverse events for [device / pacemaker / infusion pump / insulin pump]", "device recalls for [firm/product]", "supplement / vitamin / cosmetic adverse reactions", "is [drug] in shortage", "what injectables are on shortage", "veterinary / animal adverse events for [drug] in [dog/cat/horse]", "food recall for listeria", "MAUDE report for [device]", "CAERS reactions for [brand]". DO NOT USE for drug adverse-event SIGNAL detection or disproportionality (PRR / ROR / IC) or drug-AE association scoring — that is `tooluniverse-pharmacovigilance` / `tooluniverse-adverse-event-detection`. This skill is multi-product surveillance and retrieval, not drug-AE statistical signal mining.
tools
--- name: tooluniverse-phewas description: Cross-ancestry / cross-biobank phenome-wide association (PheWAS) and replication. Given ONE variant (rsID) or ONE gene, look up every phenotype it associates with across European/UK (UKB-TOPMed), Finnish (FinnGen), Japanese (BioBank Japan), and Taiwanese (TPMI) biobanks, plus exome-wide gene-burden PheWAS (Genebass), then judge whether an association replicates across ancestries or is population-specific. Use whenever the user asks "what else is this va
tools
Dereplicate a putative natural product and assign its chemical taxonomy. Use to answer "is [compound] a known natural product", "what microbe/organism produces [compound]", "what chemical class is [compound]", "dereplicate this metabolite (by formula/exact mass/InChIKey/SMILES)", or "classify this molecule into ChemOnt". Searches NPAtlas for known microbial natural products (producing organism + literature reference), assigns the ChemOnt kingdom→superclass→class→subclass hierarchy via ClassyFire, resolves systematic IUPAC names to structure via OPSIN, and cross-references identity in PubChem. NOT for general drug/compound identity or ADMET (use tooluniverse-chemical-compound-retrieval / tooluniverse-small-molecule-discovery) and NOT for metabolomics pathway/enrichment analysis (use tooluniverse-metabolomics skills).
tools
Genome-ASSEMBLY discovery, QC, and replicon mapping for any organism (bacteria, archaea, fungi, and beyond) using NCBI Datasets. Resolves an organism name or taxid to assemblies, picks the reference/representative or best-quality assembly, pulls assembly QC metrics (total length, contig/scaffold N50, contig count, GC%, assembly level, RefSeq category), enumerates chromosomes and plasmids via per-replicon sequence reports, and compares candidate assemblies on quality. Use for "what genomes are available for [organism]", "assembly stats / N50 / GC content for [GCF_/GCA_ accession]", "how many plasmids does [strain] have", "compare assemblies for [species]", "find the reference genome for [taxon]", "is this assembly Complete Genome or just contigs". NOT for gene-level orthology/synteny (use tooluniverse-comparative-genomics), plant gene structure (use tooluniverse-plant-genomics), de novo assembly from raw reads (no tool exists), or taxonomy-only name/lineage lookups.