skills/tooluniverse-noncoding-rna/SKILL.md
Non-coding RNA analysis — miRNAs (miRBase, miRDB targets), lncRNAs (LNCipedia, RNAcentral), circRNAs, snoRNAs, and other ncRNA classes. Distinct mechanisms per class — miRNAs repress mRNA; lncRNAs scaffold/decoy/enhance. Use for ncRNA function prediction, miRNA-target prediction, lncRNA functional annotation, and ncRNA-disease association queries.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-noncoding-rnaInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Pipeline for identifying, annotating, and interpreting non-coding RNAs and their biological roles. Covers microRNAs (miRNAs), long non-coding RNAs (lncRNAs), and other ncRNA classes.
Key principles:
Type-based reasoning — look up, don't guess: Non-coding RNA function depends on type: miRNA silences target mRNAs (look up targets in miRTarBase/TargetScan), lncRNA has diverse functions (scaffolding, guiding, decoying — check literature for the specific lncRNA), circRNA may sponge miRNAs.
For any ncRNA query: first identify the class from the name/sequence, then select the appropriate evidence source. Do not assume function based on name alone — a gene named "LINC" may have a characterized mechanism, or none at all. Always search PubMed for the specific ncRNA before interpreting. For miRNAs, validated targets (T1) from miRTarBase outweigh any computational prediction — a predicted target with no experimental support is a hypothesis, not a finding. For lncRNAs, mechanism is almost always determined by experimental studies; use PubMed_search_articles with the lncRNA name + "mechanism" or "function" to find relevant evidence. For circRNAs, miRNA sponging is the most common proposed mechanism but is frequently over-claimed — look for CLIP-seq or reporter assay evidence before asserting it.
Not this skill: For mRNA expression analysis, use tooluniverse-rnaseq-deseq2. For CRISPR screens, use tooluniverse-crispr-screen-analysis.
| Tool | Use For |
|------|---------|
| miRBase_search_mirna | Search miRNAs by name, accession, or sequence |
| miRBase_get_mirna | Detailed miRNA info (sequence, genomic location, family) |
| miRBase_get_mirna | Mature miRNA sequences and annotations |
| PubMed_search_articles | Search for validated miRNA targets in literature (e.g., "miR-21 target validation") |
| LNCipedia_search_lncrna | Search lncRNAs by name, gene symbol, or transcript ID |
| LNCipedia_get_lncrna | Detailed lncRNA transcript info (sequence, structure, conservation) |
| LNCipedia_get_lncrna_xrefs | lncRNA gene info with all transcript variants |
| LNCipedia_search_ncrna_by_type | List all transcripts for a lncRNA gene |
| LNCipedia_get_lncrna_publications | lncRNA sequence (FASTA format) |
| RNAcentral_search | Search all ncRNA types across databases |
| RNAcentral_get_by_accession | Detailed ncRNA annotations from 40+ databases |
| Rfam_get_family | RNA family details (structure, alignment, species distribution) |
| Rfam_search_sequence | Search RNA families by keyword |
| DisGeNET_search_gene | ncRNA-disease associations |
| PubMed_search_articles | ncRNA literature |
| GTEx_get_median_gene_expression | Tissue expression of ncRNA genes |
Phase 0: ncRNA Identity & Classification
Name/ID → miRBase/LNCipedia/RNAcentral → class, sequence, genomic location
|
Phase 1: Target & Interaction Analysis
miRNA → target mRNAs; lncRNA → interacting proteins/RNAs/chromatin
|
Phase 2: Expression & Tissue Specificity
GTEx/GEO → where is it expressed? Tissue-specific or ubiquitous?
|
Phase 3: Disease Associations
DisGeNET/PubMed/CTD → ncRNA-disease links with evidence
|
Phase 4: Functional Interpretation
Pathway enrichment of targets → biological role → clinical significance
ncRNA classes by size and database:
Identification workflow:
miR- or hsa-mir- → search miRBaseLINC, MALAT, HOTAIR, XIST, or ends in -AS1 → search LNCipediaFor miRNAs — the targets determine the biology:
PRIMARY TOOL: ENCORI_get_miRNA_targets looks up miRNA-target interactions from ENCORI/starBase (CLIP-seq-supported + computationally predicted), no download needed:
ENCORI_get_miRNA_targets(mirna="hsa-miR-21-5p", clip_min=1) — each hit reports clip_experiments (CLIP-seq support; higher = stronger experimental evidence) and predicted_by (which programs call it). Results are ranked by CLIP support, so the top rows are the best-supported targets.ENCORI_get_miRNA_targets(gene="TP53") — which miRNAs target a gene.Supporting/fallback approaches:
3. Literature (for mechanism/validation context): PubMed_search_articles(query="miR-21 target validation luciferase")
4. Cross-references: miRBase_get_mirna_xrefs(accession="MIMAT0000076")
5. For novel miRNAs not in ENCORI: search PubMed for "[miRNA] target".
Well-studied miRNA targets (for common oncomiRs/tumor suppressors):
Target interpretation framework:
For lncRNAs — the mechanism varies:
| lncRNA Mechanism | Example | How to Investigate | |---|---|---| | Chromatin modifier | HOTAIR, XIST | Check interacting proteins (PRC2, LSD1) via PubMed | | Transcription regulator | NEAT1, MEG3 | Check nearby genes (cis-regulation) via genomic location | | miRNA sponge | MALAT1, circRNAs | Search for miRNA binding sites | | Scaffold | NKILA, BCAR4 | Check protein interactions | | Enhancer RNA | eRNAs | Check ENCODE enhancer annotations |
GTEx_get_median_gene_expression(gene_symbol="MIR21") # miRNA host gene expression
# Note: GTEx measures RNA-seq; miRNA expression may need miRNA-seq data from GEO
Interpretation: Tissue-restricted ncRNAs are often functionally important in that tissue. Ubiquitous ncRNAs (like MALAT1) tend to have housekeeping roles.
DisGeNET_search_gene(query="MIR21") # miR-21 disease associations
PubMed_search_articles(query="miR-21 biomarker cancer")
Key ncRNA-disease associations (well-established T1 examples — always verify via DisGeNET or PubMed for the specific ncRNA):
After identifying miRNA targets (Phase 1), run pathway enrichment:
# Collect validated target gene symbols
targets = ["PTEN", "PDCD4", "TPM1", "RECK", "SPRY1"] # miR-21 targets
# Pathway enrichment
ReactomeAnalysis_pathway_enrichment(identifiers="PTEN PDCD4 TPM1 RECK SPRY1")
STRING_get_network(identifiers="PTEN\rPDCD4\rTPM1\rRECK\rSPRY1", species=9606)
Interpretation: If miR-21 targets are enriched in apoptosis and PI3K-AKT signaling → miR-21 is an oncomiR that promotes survival by simultaneously suppressing multiple tumor suppressors.
Report structure:
TargetScan provides the best computational miRNA target predictions but has no REST API. Download and process locally:
# Step 1: Download TargetScan predicted targets (one-time, ~10MB zipped)
# URL: https://www.targetscan.org/vert_80/vert_80_data_download/Summary_Counts.default_predictions.txt.zip
import pandas as pd
import zipfile, io, requests
url = "https://www.targetscan.org/vert_80/vert_80_data_download/Summary_Counts.default_predictions.txt.zip"
resp = requests.get(url, timeout=60)
with zipfile.ZipFile(io.BytesIO(resp.content)) as z:
fname = z.namelist()[0]
df = pd.read_csv(z.open(fname), sep='\t')
# Step 2: Query for a specific miRNA family
mirna = "miR-21-5p" # or "miR-21/590-5p" (TargetScan uses family names)
targets = df[df['miRNA Family'].str.contains("miR-21", case=False, na=False)]
# Step 3: Rank by cumulative weighted context++ score
targets_ranked = targets.sort_values('Cumulative weighted context++ score', ascending=True)
print(f"Top 20 predicted targets of {mirna}:")
for _, row in targets_ranked.head(20).iterrows():
print(f" {row['Target Gene']:10s} score={row['Cumulative weighted context++ score']:.3f} "
f"sites={row['Total num conserved sites']}")
Interpretation: More negative context++ score = stronger predicted repression. Conserved sites (>1) are higher confidence.
miRTarBase has Cloudflare protection blocking programmatic access. Use the R/Bioconductor data package or bulk download:
# Option 1: Download from miRTarBase bulk export (requires browser download first)
# Go to: https://mirtarbase.cuhk.edu.cn/~miRTarBase/miRTarBase_2025/
# Download: hsa_MTI.xlsx (human miRNA-target interactions)
# Option 2: Use the GitHub data dump
# https://github.com/jorainer/mirtarbase — R package with cached data
# Once you have the file:
import pandas as pd
mti = pd.read_excel("hsa_MTI.xlsx") # or read_csv if TSV
# Filter for your miRNA
mir21_targets = mti[mti['miRNA'].str.contains('hsa-miR-21', case=False, na=False)]
print(f"miR-21 validated targets: {len(mir21_targets)}")
# Filter by evidence strength
strong = mir21_targets[mir21_targets['Support Type'].str.contains(
'Luciferase|Reporter|Western|CLIP', case=False, na=False
)]
print(f" Strong evidence (reporter/CLIP): {len(strong)}")
for _, row in strong.head(10).iterrows():
print(f" {row['Target Gene']:10s} — {row['Support Type']}")
When download is not available: Use the built-in reference table in Phase 1 for well-studied miRNAs, or search PubMed for validated targets.
tools
Post-market safety surveillance and recall/adverse-event RETRIEVAL across the full spectrum of FDA-regulated products that are NOT covered by the drug-AE signal skills: medical devices, food / dietary supplements / cosmetics, veterinary drugs, and drug supply (shortages). Orchestrates openFDA endpoints (MAUDE device adverse events + device recalls + 510(k), CAERS food/supplement/ cosmetic adverse events, veterinary adverse events, drug shortages, and cross-product enforcement/recall reports). USE WHEN the user asks: "are there adverse events for [device / pacemaker / infusion pump / insulin pump]", "device recalls for [firm/product]", "supplement / vitamin / cosmetic adverse reactions", "is [drug] in shortage", "what injectables are on shortage", "veterinary / animal adverse events for [drug] in [dog/cat/horse]", "food recall for listeria", "MAUDE report for [device]", "CAERS reactions for [brand]". DO NOT USE for drug adverse-event SIGNAL detection or disproportionality (PRR / ROR / IC) or drug-AE association scoring — that is `tooluniverse-pharmacovigilance` / `tooluniverse-adverse-event-detection`. This skill is multi-product surveillance and retrieval, not drug-AE statistical signal mining.
tools
--- name: tooluniverse-phewas description: Cross-ancestry / cross-biobank phenome-wide association (PheWAS) and replication. Given ONE variant (rsID) or ONE gene, look up every phenotype it associates with across European/UK (UKB-TOPMed), Finnish (FinnGen), Japanese (BioBank Japan), and Taiwanese (TPMI) biobanks, plus exome-wide gene-burden PheWAS (Genebass), then judge whether an association replicates across ancestries or is population-specific. Use whenever the user asks "what else is this va
tools
Dereplicate a putative natural product and assign its chemical taxonomy. Use to answer "is [compound] a known natural product", "what microbe/organism produces [compound]", "what chemical class is [compound]", "dereplicate this metabolite (by formula/exact mass/InChIKey/SMILES)", or "classify this molecule into ChemOnt". Searches NPAtlas for known microbial natural products (producing organism + literature reference), assigns the ChemOnt kingdom→superclass→class→subclass hierarchy via ClassyFire, resolves systematic IUPAC names to structure via OPSIN, and cross-references identity in PubChem. NOT for general drug/compound identity or ADMET (use tooluniverse-chemical-compound-retrieval / tooluniverse-small-molecule-discovery) and NOT for metabolomics pathway/enrichment analysis (use tooluniverse-metabolomics skills).
tools
Genome-ASSEMBLY discovery, QC, and replicon mapping for any organism (bacteria, archaea, fungi, and beyond) using NCBI Datasets. Resolves an organism name or taxid to assemblies, picks the reference/representative or best-quality assembly, pulls assembly QC metrics (total length, contig/scaffold N50, contig count, GC%, assembly level, RefSeq category), enumerates chromosomes and plasmids via per-replicon sequence reports, and compares candidate assemblies on quality. Use for "what genomes are available for [organism]", "assembly stats / N50 / GC content for [GCF_/GCA_ accession]", "how many plasmids does [strain] have", "compare assemblies for [species]", "find the reference genome for [taxon]", "is this assembly Complete Genome or just contigs". NOT for gene-level orthology/synteny (use tooluniverse-comparative-genomics), plant gene structure (use tooluniverse-plant-genomics), de novo assembly from raw reads (no tool exists), or taxonomy-only name/lineage lookups.