plugin/skills/tooluniverse-protein-structural-annotation-pdb/SKILL.md
Given a PDB structure, produce a per-residue annotation table: which residues sit at a binding interface (vs a partner chain), which line a ligand pocket, which are buried (core) vs solvent-exposed (surface), and optionally secondary structure. This is the structural track drawn under a DMS heatmap and the structural prior SAE feature drops are read against. Use when you need to anchor a variant-interpretation or DMS analysis to the protein's actual physical context.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-protein-structural-annotation-pdbInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
For each residue of a target protein chain, classify whether it sits at a binding interface, in a ligand pocket, is buried vs solvent-exposed, and (optionally) which secondary-structure element it belongs to. This is the annotation track that anchors any DMS heatmap or per-residue interpretation to the protein's actual physical context.
Not for:
tooluniverse-computational-biophysics| Input | Format | Example |
|---|---|---|
| PDB ID | 4 characters | 6VJJ (KRAS-RAF1-GTP analogue) |
| Target chain | single character | A |
| Partner chain(s) | list of chain IDs | ["B"] |
| Ligand resnames | 3-letter PDB names | ["GNP", "MG"] |
Optional:
distance_cutoff (default 5.0 Å)core_rsa_cutoff (default 0.25)include_secondary_structure (default false; uses PDBe REST if true)pdb_content instead of pdb_id for local / predicted structuresIf you only have a UniProt accession or gene symbol, pick a structure first:
# PDBe's curated UniProt→PDB mapping (recommended; ranks by coverage + resolution)
PDBeSIFTS_get_best_structures(uniprot_accession="P01116")
# Returns a ranked list of PDB IDs for KRAS with chain mapping
# Or full list (unranked)
PDBeSIFTS_get_all_structures(uniprot_accession="P01116")
# RCSB advanced search (free-text, when you don't have a UniProt yet)
RCSBAdvSearch_search_structures(query="KRAS GTP complex")
Pick the structure that contains the right complex: include the binding partner chain you care about, the relevant ligand, and a resolution adequate for distance-based classification (≤ 3 Å is a safe default).
Structure_annotate_per_residue(
pdb_id="6VJJ",
target_chain="A",
partner_chains=["B"],
ligand_resnames=["GNP", "MG"],
distance_cutoff=5.0,
core_rsa_cutoff=0.25,
include_secondary_structure=False,
)
Returns annotations: List[{position, aa, dist_partner, dist_ligand, rsa, region, is_core, ss_element?}] for every residue of the target chain. For
KRAS in 6VJJ, this yields 168 rows.
PDB residue numbers carry silent offsets — crystal constructs add N-terminal cloning residues, and published figures sometimes shift the track relative to the panel sequence. Always verify with a landmark:
# Get the canonical reference sequence
UniProt_get_sequence_by_accession(accession="P01116")
# Then spot-check: KRAS canonical position 12 should be glycine
assert annotations[11]["aa"] == "G" # 1-indexed position 12, 0-indexed index 11
If the landmark mismatches, record the offset explicitly (e.g. pdb_pos = uniprot_pos + offset) before any downstream join. Do not silently rebase
positions.
If you set include_secondary_structure=True, the tool fetches per-residue
helix/strand/coil from PDBe REST. Alternatively, use the dedicated PDBe
secondary-structure tool separately:
pdbe_get_entry_secondary_structure(pdb_id="6VJJ")
# Returns per-chain helix + strand ranges
The returned table is keyed by 1-based canonical residue number. Typical downstream uses:
| Use case | Field to read |
|---|---|
| Is variant X in a pocket? | by_pos = {a["position"]: a for a in annotations}; by_pos[X]["region"] in ("ligand", "both") — index by position field, NOT list index (PDB residue numbers may not start at 1 or be contiguous) |
| Build a DMS heatmap annotation track | [(r["position"], r["region"], r["is_core"], r.get("ss_element"))] |
| Filter SAE hotspot features to ligand-binding residues | filter clusters by region == "ligand" |
| Compare buried vs surface signal | group statistics by is_core |
| Region label | Biological meaning | Common functional role |
|---|---|---|
| interface | Within distance_cutoff of a partner chain | Protein-protein binding residue; variants often disrupt complex formation |
| ligand | Within distance_cutoff of a ligand heavy atom | Pocket residue; variants often disrupt substrate / cofactor / drug binding |
| both | Both | Allosteric or shared-surface residue |
| other | Neither | Surface (if not is_core) or core (if is_core) — variants impact through stability or distal effects |
| is_core=true | RSA < core_rsa_cutoff (0.25 by default) | Buried residue; variants often destabilize the fold |
partner_chains=[] is permitted but then all dist_partner values
are null — interface analysis is skipped entirely.| Tool | Role | Use it for |
|---|---|---|
| Structure_annotate_per_residue | This skill's atomic tool | The annotation itself |
| PDBeSIFTS_get_best_structures | UniProt → ranked PDB list | Step 1 |
| PDBeSIFTS_get_all_structures | UniProt → full PDB list | Step 1 |
| RCSBAdvSearch_search_structures | Free-text RCSB search | Step 1 |
| UniProt_get_sequence_by_accession | Canonical sequence | Step 3 (numbering verification) |
| pdbe_get_entry_secondary_structure | SS alone | Step 4 alternative |
| tooluniverse-residue-functional-mechanism-interpretation | Downstream consumer | Use this annotation as the structural evidence layer when interpreting DMS hotspots; the skill also plots an annotated DMS heatmap in its Step 7 |
tools
Post-market safety surveillance and recall/adverse-event RETRIEVAL across the full spectrum of FDA-regulated products that are NOT covered by the drug-AE signal skills: medical devices, food / dietary supplements / cosmetics, veterinary drugs, and drug supply (shortages). Orchestrates openFDA endpoints (MAUDE device adverse events + device recalls + 510(k), CAERS food/supplement/ cosmetic adverse events, veterinary adverse events, drug shortages, and cross-product enforcement/recall reports). USE WHEN the user asks: "are there adverse events for [device / pacemaker / infusion pump / insulin pump]", "device recalls for [firm/product]", "supplement / vitamin / cosmetic adverse reactions", "is [drug] in shortage", "what injectables are on shortage", "veterinary / animal adverse events for [drug] in [dog/cat/horse]", "food recall for listeria", "MAUDE report for [device]", "CAERS reactions for [brand]". DO NOT USE for drug adverse-event SIGNAL detection or disproportionality (PRR / ROR / IC) or drug-AE association scoring — that is `tooluniverse-pharmacovigilance` / `tooluniverse-adverse-event-detection`. This skill is multi-product surveillance and retrieval, not drug-AE statistical signal mining.
tools
--- name: tooluniverse-phewas description: Cross-ancestry / cross-biobank phenome-wide association (PheWAS) and replication. Given ONE variant (rsID) or ONE gene, look up every phenotype it associates with across European/UK (UKB-TOPMed), Finnish (FinnGen), Japanese (BioBank Japan), and Taiwanese (TPMI) biobanks, plus exome-wide gene-burden PheWAS (Genebass), then judge whether an association replicates across ancestries or is population-specific. Use whenever the user asks "what else is this va
tools
Dereplicate a putative natural product and assign its chemical taxonomy. Use to answer "is [compound] a known natural product", "what microbe/organism produces [compound]", "what chemical class is [compound]", "dereplicate this metabolite (by formula/exact mass/InChIKey/SMILES)", or "classify this molecule into ChemOnt". Searches NPAtlas for known microbial natural products (producing organism + literature reference), assigns the ChemOnt kingdom→superclass→class→subclass hierarchy via ClassyFire, resolves systematic IUPAC names to structure via OPSIN, and cross-references identity in PubChem. NOT for general drug/compound identity or ADMET (use tooluniverse-chemical-compound-retrieval / tooluniverse-small-molecule-discovery) and NOT for metabolomics pathway/enrichment analysis (use tooluniverse-metabolomics skills).
tools
Genome-ASSEMBLY discovery, QC, and replicon mapping for any organism (bacteria, archaea, fungi, and beyond) using NCBI Datasets. Resolves an organism name or taxid to assemblies, picks the reference/representative or best-quality assembly, pulls assembly QC metrics (total length, contig/scaffold N50, contig count, GC%, assembly level, RefSeq category), enumerates chromosomes and plasmids via per-replicon sequence reports, and compares candidate assemblies on quality. Use for "what genomes are available for [organism]", "assembly stats / N50 / GC content for [GCF_/GCA_ accession]", "how many plasmids does [strain] have", "compare assemblies for [species]", "find the reference genome for [taxon]", "is this assembly Complete Genome or just contigs". NOT for gene-level orthology/synteny (use tooluniverse-comparative-genomics), plant gene structure (use tooluniverse-plant-genomics), de novo assembly from raw reads (no tool exists), or taxonomy-only name/lineage lookups.