scientific-skills/Protocol Design/dual-disease-transcriptomic-ml-planner/SKILL.md
--- name: dual-disease-transcriptomic-ml-planner description: Generates complete dual-disease transcriptomic + machine learning research designs from a user-provided disease pair. Use when users want to identify shared DEGs, common hub genes, cross-disease biomarkers, or shared molecular mechanisms between two diseases using public GEO data. Triggers: "shared biomarker study for two diseases", "dual-disease transcriptomic ML paper", "identify common DEGs between disease A and B", "cross-disease
npx skillsauth add aipoch/medical-research-skills scientific-skills/Protocol Design/dual-disease-transcriptomic-ml-plannerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Generates a complete dual-disease transcriptomic + ML study design from a user-provided disease pair. Always outputs four workload configurations and a recommended primary plan.
| Style | Description | Example | |-------|-------------|---------| | A. Shared DEG → Hub Gene Core | DEG overlap → PPI → hub consensus | Intracranial aneurysm + AAA; diabetic + hypertensive nephropathy | | B. Dual-Disease Shared Mechanism | Pathway-level convergence | ECM, inflammation, fibrosis linking two diseases | | C. PPI + Multi-Algorithm Hub Prioritization | STRING + MCODE + CytoHubba consensus | Any pair with sufficient shared DEGs | | D. Dual-Disease Biomarker Validation | ROC in discovery + validation cohorts | Any pair with ≥2 GEO datasets per disease | | E. Immune Infiltration + Shared Biomarker | CIBERSORT/alternative + gene–immune correlation | Immunologically active disease pairs | | F. Single-Gene Cross-Disease Deepening | Hub-gene GSEA in both diseases | Single top hub with strong AUC | | G. Publication-Oriented Integrated Design | Full pipeline: DEG → PPI → ROC → immune → GSEA | High-impact submission target |
Identify:
Always generate all four. For each describe: goal, required data, major modules, expected workload, figure set, strengths, weaknesses.
| Config | Goal | Timeframe | Best For | |--------|------|-----------|----------| | Lite | Shared DEG + basic hub, 1 dataset per disease | 2–4 weeks | Pilot, skeleton manuscript, single-dataset constraint | | Standard | Full pipeline + validation + ROC + one deepening layer | 5–9 weeks | Core publishable paper | | Advanced | Standard + immune + GSEA + multi-cohort robustness | 9–14 weeks | Competitive journal target | | Publication+ | Full multi-layer + experimental suggestions + reviewer defense | 12–20 weeks | High-impact submission |
Select the best-fit configuration and explain why, given disease pair biology, GEO data availability, time constraints, and publication ambition.
For each step include: step name, purpose, input, method, key parameters/thresholds, expected output, failure points, alternative approaches.
Dataset & Preprocessing
Fault tolerance — dataset level:
DEG & Shared Signature
Fault tolerance — DEG intersection:
Enrichment & Shared Mechanism
PPI & Hub Prioritization
Biomarker Performance
Fault tolerance — ROC:
Immune Infiltration (when disease-appropriate per Hard Rule 5)
Single-Gene Deepening (Standard and above)
→ Full figure list and table templates: references/figure_plan_template.md
Core figures: workflow schematic (Fig 1), DEG volcanos + Venn (Fig 2), shared DEG heatmap (Fig 3), GO/KEGG enrichment (Fig 4), PPI + MCODE + hub ranking (Fig 5), ROC curves (Fig 6), immune infiltration + correlation (Fig 7), single-gene GSEA (Fig 8). Tables: dataset summary, shared DEG list, hub rankings, ROC/AUC summary.
State what each layer proves and what it does not prove:
Always include a self-critical section addressing:
Public data only, one discovery dataset per disease, DEG + Venn + GO/KEGG, STRING + MCODE + CytoHubba top gene, ROC in discovery cohort, one-page interpretation. 2–4 week timeline. Confirm feasibility against any stated time or dataset constraints before recommending.
→ Full upgrade impact table: references/upgrade_path.md
Key upgrades by impact: validation cohort per disease (High / Low–Medium), multi-algorithm hub consensus (High / Low), cross-platform reproducibility logic (High / Medium), immune infiltration (Medium / Medium), single-gene GSEA (Medium / Low), mini-signature 3–5 genes (Medium / Medium).
When providing R code examples or pipeline frameworks:
# EXAMPLE ID — replace with your actual GSE accession before runningif (length(shared_genes) == 0) {
stop("No shared DEGs found. Recovery options: (1) relax logFC to 0.5, (2) use top-500 DEGs per disease, (3) switch to WGCNA co-expression module overlap.")
}
BiocManager::install() calls where needed.GEOquery::getGEO("GSEsearch", ...) or direct search at https://www.ncbi.nlm.nih.gov/geo/Standard R pipeline template:
library(GEOquery); library(limma); library(clusterProfiler); library(pROC)
# Load datasets — EXAMPLE IDs: replace before running
gse_disease1 <- getGEO("GSEXXXXX", GSEMatrix = TRUE)[[1]] # EXAMPLE ID
gse_disease2 <- getGEO("GSEXXXXX", GSEMatrix = TRUE)[[1]] # EXAMPLE ID
# DEG analysis (repeat for disease2)
design <- model.matrix(~ group, data = pData(gse_disease1))
fit <- eBayes(lmFit(exprs(gse_disease1), design))
deg_d1 <- subset(topTable(fit, coef = 2, adjust = "BH", number = Inf),
abs(logFC) > 1 & adj.P.Val < 0.05)
# Shared DEG intersection with zero-guard
shared_genes <- intersect(rownames(deg_d1), rownames(deg_d2))
if (length(shared_genes) == 0) {
stop("No shared DEGs found. Recovery: relax logFC to 0.5 or use top-500 DEGs per disease.")
}
# ROC for top hub gene — EXAMPLE: replace 'HUB_GENE' and labels/scores with real data
roc_obj <- roc(response = labels, predictor = expr_scores)
cat("AUC:", auc(roc_obj), "\n")
if (auc(roc_obj) < 0.70) warning("AUC below 0.70 threshold. Consider mini-signature approach.")
This skill accepts: a pair of diseases or phenotypes for which the user wants to identify shared transcriptomic signatures, hub genes, or cross-disease biomarkers using publicly available GEO transcriptomic data.
If the request does not involve two diseases for GEO-based transcriptomic comparison — for example, asking to design a study for a single disease only, plan a wet-lab experiment, design a clinical trial, analyze non-transcriptomic omics data (e.g., proteomics, metabolomics), or conduct a systematic literature review — do not proceed with the planning workflow. Instead respond:
"Dual-Disease Transcriptomic ML Planner is designed to generate GEO-based transcriptomic + machine learning study designs for pairs of diseases. Your request appears to be outside this scope. Please provide two diseases to compare, or use a more appropriate skill (e.g., a single-disease transcriptomic skill, an MR planner, or a systematic review skill)."
| File | Content | Used In | |------|---------|---------| | references/tissue_and_tool_decisions.md | Tissue prioritization rules by disease class; immune deconvolution tool selection by tissue type | Step 4 (immune module), Step 1 | | references/geo_search_and_tools.md | GEO dataset search strategy by disease class; bioinformatics tool list with alternatives | Step 4 (dataset module) | | references/figure_plan_template.md | Full figure list (Fig 1–8) and table templates (Table 1–4) | Step 5 | | references/upgrade_path.md | Publication upgrade impact vs complexity table | Step 9 |
tools
Generates complete conventional oncology bulk-transcriptome biomarker and hub-gene research designs from a user-provided cancer type and study direction. Always use this skill whenever a user wants to design, plan, or build a tumor bioinformatics study centered on differential expression, prognostic filtering or risk modeling, PPI-based hub-gene prioritization, diagnostic/prognostic evaluation, clinical association, immune infiltration context, methylation context, and optional tissue or cell validation. Covers five study patterns (signature-first prognostic workflow, hub-gene-first biomarker workflow, hybrid signature-to-hub workflow, immune-context biomarker workflow, translational validation workflow) and always outputs four workload configs (Lite / Standard / Advanced / Publication+) with recommended primary plan, step-by-step workflow, figure plan, validation strategy, minimal executable version, publication upgrade path...
development
Generates complete conventional non-oncology bioinformatics research designs from a user-provided disease context, process-related gene family or biological theme, and validation direction. Use when a study centers on multi-dataset bulk transcriptome integration, DEG analysis, process-gene intersection, enrichment analysis, GSEA, PPI hub-gene prioritization, TF/miRNA regulatory networks, ROC-based biomarker evaluation, and immune infiltration analysis. Covers five study patterns (process-DEG discovery, enrichment/GSEA interpretation, hub-gene prioritization, regulatory-network and immune interpretation, multi-layer public validation) and always outputs Lite / Standard / Advanced / Publication+ with a recommended primary plan, stepwise workflow, figure plan, validation hierarchy, minimal executable version, publication upgrade path, and strictly verified literature retrieval.
tools
Plans confounder control, variable adjustment logic, and bias mitigation strategies at the protocol stage for clinical, epidemiologic, translational, observational, and biomarker studies. Always use this skill when a user needs to identify major confounders, decide which variables should or should not be adjusted for, compare matching/stratification/weighting approaches, anticipate selection or measurement bias, or pressure-test a study design before execution. Focus on bias sensing, causal structure awareness, variable-role classification, and critical design review rather than generic statistical advice.
testing
Generates complete comparative network-toxicology research designs from a user-provided exposure pair, shared toxic phenotype, and validation direction. Use when a study centers on two related exposures under one outcome and needs target collection, shared-vs-specific target decomposition, enrichment, PPI hub prioritization, docking, optional transcriptomic cross-checks, and conservative mechanistic synthesis. Covers five study patterns and always outputs Lite / Standard / Advanced / Publication+ with a recommended primary plan, stepwise workflow, figure plan, validation hierarchy, minimal executable version, publication upgrade path, and strictly verified literature retrieval.