plugin/skills/tooluniverse-noncoding-rna/SKILL.md
Non-coding RNA analysis — miRNAs (miRBase, miRDB targets), lncRNAs (LNCipedia, RNAcentral), circRNAs, snoRNAs, and other ncRNA classes. Distinct mechanisms per class — miRNAs repress mRNA; lncRNAs scaffold/decoy/enhance. Use for ncRNA function prediction, miRNA-target prediction, lncRNA functional annotation, and ncRNA-disease association queries.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-noncoding-rnaInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Pipeline for identifying, annotating, and interpreting non-coding RNAs and their biological roles. Covers microRNAs (miRNAs), long non-coding RNAs (lncRNAs), and other ncRNA classes.
Key principles:
Type-based reasoning — look up, don't guess: Non-coding RNA function depends on type: miRNA silences target mRNAs (look up targets in miRTarBase/TargetScan), lncRNA has diverse functions (scaffolding, guiding, decoying — check literature for the specific lncRNA), circRNA may sponge miRNAs.
For any ncRNA query: first identify the class from the name/sequence, then select the appropriate evidence source. Do not assume function based on name alone — a gene named "LINC" may have a characterized mechanism, or none at all. Always search PubMed for the specific ncRNA before interpreting. For miRNAs, validated targets (T1) from miRTarBase outweigh any computational prediction — a predicted target with no experimental support is a hypothesis, not a finding. For lncRNAs, mechanism is almost always determined by experimental studies; use PubMed_search_articles with the lncRNA name + "mechanism" or "function" to find relevant evidence. For circRNAs, miRNA sponging is the most common proposed mechanism but is frequently over-claimed — look for CLIP-seq or reporter assay evidence before asserting it.
Not this skill: For mRNA expression analysis, use tooluniverse-rnaseq-deseq2. For CRISPR screens, use tooluniverse-crispr-screen-analysis.
| Tool | Use For |
|------|---------|
| miRBase_search_mirna | Search miRNAs by name, accession, or sequence |
| miRBase_get_mirna | Detailed miRNA info (sequence, genomic location, family) |
| miRBase_get_mature_mirna | Mature miRNA sequences and annotations |
| PubMed_search_articles | Search for validated miRNA targets in literature (e.g., "miR-21 target validation") |
| LNCipedia_search_lncrna | Search lncRNAs by name, gene symbol, or transcript ID |
| LNCipedia_get_lncrna | Detailed lncRNA transcript info (sequence, structure, conservation) |
| LNCipedia_get_lncrna_xrefs | lncRNA gene info with all transcript variants |
| LNCipedia_search_ncrna_by_type | List all transcripts for a lncRNA gene |
| LNCipedia_get_lncrna_publications | lncRNA sequence (FASTA format) |
| RNAcentral_search | Search all ncRNA types across databases |
| RNAcentral_get_by_accession | Detailed ncRNA annotations from 40+ databases |
| Rfam_get_family | RNA family details (structure, alignment, species distribution) |
| Rfam_search_sequence | Search RNA families by keyword |
| DisGeNET_search_gene | ncRNA-disease associations |
| PubMed_search_articles | ncRNA literature |
| GTEx_get_median_gene_expression | Tissue expression of ncRNA genes |
Phase 0: ncRNA Identity & Classification
Name/ID → miRBase/LNCipedia/RNAcentral → class, sequence, genomic location
|
Phase 1: Target & Interaction Analysis
miRNA → target mRNAs; lncRNA → interacting proteins/RNAs/chromatin
|
Phase 2: Expression & Tissue Specificity
GTEx/GEO → where is it expressed? Tissue-specific or ubiquitous?
|
Phase 3: Disease Associations
DisGeNET/PubMed/CTD → ncRNA-disease links with evidence
|
Phase 4: Functional Interpretation
Pathway enrichment of targets → biological role → clinical significance
ncRNA classes by size and database:
Identification workflow:
miR- or hsa-mir- → search miRBaseLINC, MALAT, HOTAIR, XIST, or ends in -AS1 → search LNCipediaFor miRNAs — the targets determine the biology:
NOTE: There is no dedicated miRNA target lookup tool in ToolUniverse. To find miRNA targets:
PubMed_search_articles(query="miR-21 target validation luciferase")miRBase_get_mirna_xrefs(accession="MIMAT0000076") — may link to external target databasesWell-studied miRNA targets (for common oncomiRs/tumor suppressors):
Target interpretation framework:
For lncRNAs — the mechanism varies:
| lncRNA Mechanism | Example | How to Investigate | |---|---|---| | Chromatin modifier | HOTAIR, XIST | Check interacting proteins (PRC2, LSD1) via PubMed | | Transcription regulator | NEAT1, MEG3 | Check nearby genes (cis-regulation) via genomic location | | miRNA sponge | MALAT1, circRNAs | Search for miRNA binding sites | | Scaffold | NKILA, BCAR4 | Check protein interactions | | Enhancer RNA | eRNAs | Check ENCODE enhancer annotations |
GTEx_get_median_gene_expression(gene_symbol="MIR21") # miRNA host gene expression
# Note: GTEx measures RNA-seq; miRNA expression may need miRNA-seq data from GEO
Interpretation: Tissue-restricted ncRNAs are often functionally important in that tissue. Ubiquitous ncRNAs (like MALAT1) tend to have housekeeping roles.
DisGeNET_search_gene(query="MIR21") # miR-21 disease associations
PubMed_search_articles(query="miR-21 biomarker cancer")
Key ncRNA-disease associations (well-established T1 examples — always verify via DisGeNET or PubMed for the specific ncRNA):
After identifying miRNA targets (Phase 1), run pathway enrichment:
# Collect validated target gene symbols
targets = ["PTEN", "PDCD4", "TPM1", "RECK", "SPRY1"] # miR-21 targets
# Pathway enrichment
ReactomeAnalysis_pathway_enrichment(identifiers="PTEN PDCD4 TPM1 RECK SPRY1")
STRING_get_network(identifiers="PTEN\rPDCD4\rTPM1\rRECK\rSPRY1", species=9606)
Interpretation: If miR-21 targets are enriched in apoptosis and PI3K-AKT signaling → miR-21 is an oncomiR that promotes survival by simultaneously suppressing multiple tumor suppressors.
Report structure:
TargetScan provides the best computational miRNA target predictions but has no REST API. Download and process locally:
# Step 1: Download TargetScan predicted targets (one-time, ~10MB zipped)
# URL: https://www.targetscan.org/vert_80/vert_80_data_download/Summary_Counts.default_predictions.txt.zip
import pandas as pd
import zipfile, io, requests
url = "https://www.targetscan.org/vert_80/vert_80_data_download/Summary_Counts.default_predictions.txt.zip"
resp = requests.get(url, timeout=60)
with zipfile.ZipFile(io.BytesIO(resp.content)) as z:
fname = z.namelist()[0]
df = pd.read_csv(z.open(fname), sep='\t')
# Step 2: Query for a specific miRNA family
mirna = "miR-21-5p" # or "miR-21/590-5p" (TargetScan uses family names)
targets = df[df['miRNA Family'].str.contains("miR-21", case=False, na=False)]
# Step 3: Rank by cumulative weighted context++ score
targets_ranked = targets.sort_values('Cumulative weighted context++ score', ascending=True)
print(f"Top 20 predicted targets of {mirna}:")
for _, row in targets_ranked.head(20).iterrows():
print(f" {row['Target Gene']:10s} score={row['Cumulative weighted context++ score']:.3f} "
f"sites={row['Total num conserved sites']}")
Interpretation: More negative context++ score = stronger predicted repression. Conserved sites (>1) are higher confidence.
miRTarBase has Cloudflare protection blocking programmatic access. Use the R/Bioconductor data package or bulk download:
# Option 1: Download from miRTarBase bulk export (requires browser download first)
# Go to: https://mirtarbase.cuhk.edu.cn/~miRTarBase/miRTarBase_2025/
# Download: hsa_MTI.xlsx (human miRNA-target interactions)
# Option 2: Use the GitHub data dump
# https://github.com/jorainer/mirtarbase — R package with cached data
# Once you have the file:
import pandas as pd
mti = pd.read_excel("hsa_MTI.xlsx") # or read_csv if TSV
# Filter for your miRNA
mir21_targets = mti[mti['miRNA'].str.contains('hsa-miR-21', case=False, na=False)]
print(f"miR-21 validated targets: {len(mir21_targets)}")
# Filter by evidence strength
strong = mir21_targets[mir21_targets['Support Type'].str.contains(
'Luciferase|Reporter|Western|CLIP', case=False, na=False
)]
print(f" Strong evidence (reporter/CLIP): {len(strong)}")
for _, row in strong.head(10).iterrows():
print(f" {row['Target Gene']:10s} — {row['Support Type']}")
When download is not available: Use the built-in reference table in Phase 1 for well-studied miRNAs, or search PubMed for validated targets.
tools
PCR / qPCR primer and oligo design — design forward/reverse primers for a target region (SantaLucia nearest-neighbor thermodynamics), compute melting temperature (Tm) and annealing temperature (Ta), check GC content, and screen an oligo for hairpins and primer-dimers. Use when you need primers for a sequence, want to QC an existing primer pair, or need the Tm of an oligo. Covers the primer-design rules (Tm matching, GC clamp, 3'-end, length) and the tools' constraint quirks.
tools
Pharmacokinetic (PK) analysis of concentration-time data — non-compartmental analysis (NCA) for Cmax, Tmax, AUC (0-t and 0-∞), terminal half-life, clearance (CL), volume of distribution (Vd), MRT, and absolute bioavailability (F). Also one-compartment fitting. Use when you have plasma/serum drug concentrations over time after a dose and need PK parameters, or to compute bioavailability from IV + oral AUCs. NOT for ADMET property prediction from structure (use tooluniverse-admet-prediction).
tools
Molecular cloning assembly design — Gibson Assembly (overlap design for seamless multi-fragment joining) and Golden Gate Assembly (Type IIS / BsaI / BbsI design with unique 4-bp fusion overhangs). Use when you need to plan how to join DNA fragments into a construct, design assembly overlaps/overhangs, or decide between cloning methods. Covers the domestication (internal-site removal), overhang-uniqueness, and overlap-Tm rules. For PCR primers to generate the fragments, see tooluniverse-primer-design.
tools
Meta-analysis / evidence synthesis — pool effect sizes across studies (odds ratios, risk ratios, hazard ratios, mean differences, correlations, GWAS betas) with fixed- or random-effects models, quantify heterogeneity (Q, I², τ²), and build a forest plot. Use when you have results from MULTIPLE studies and need a single pooled estimate, or to synthesize evidence from a systematic review / multiple GWAS / replicated experiments. Handles the error-prone effect-size + standard-error preparation (converting OR/HR/CI, two-group means±SD, proportions, and correlations into the (effect, SE) the pooling step needs).