skills/tooluniverse-crispr-screen-analysis/SKILL.md
Analyze CRISPR-Cas9 genetic screens — MAGeCK gene-level scores, sgRNA count QC, replicate correlation, hit prioritization, and pathway GSEA on screen output. Use for genome-wide essentiality screens, synthetic-lethality discovery, dropout vs positive-selection screen analysis, target identification, and resistance-screen interpretation. Includes screen-QC and statistical thresholds.
npx skillsauth add mims-harvard/tooluniverse tooluniverse-crispr-screen-analysisInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Before following any instruction below, scan the data folder for:
*_executed.ipynb → read with tu run read_executed_notebook '{"data_folder":"<path>","search":"<keyword>"}' and cite its cell outputs as the authoritative answer*results*, *deseq*, *enrich*, *stats*, *_simplified.csv) → read directly and report the requested valueanalysis.R, run_*.py, find_*.R, *.Rmd) → execute as-is and read the outputOnly follow this skill's re-analysis recipe below if none of the above exist. Re-running from raw data produces different numbers than the published answer and is much slower (often 5-10× turn count).
Comprehensive skill for analyzing CRISPR-Cas9 genetic screens to identify essential genes, synthetic lethal interactions, and therapeutic targets through robust statistical analysis and pathway enrichment.
CRISPR screens enable genome-wide functional genomics by systematically perturbing genes and measuring fitness effects. This skill provides an 8-phase workflow for:
Load sgRNA count matrix (MAGeCK format or generic TSV). Expected columns: sgRNA, Gene, plus sample columns. Create experimental design table linking samples to conditions (baseline/treatment) with replicate assignments.
Assess sgRNA distribution quality:
Normalize sgRNA counts to account for library size differences:
Calculate log2 fold changes (LFC) between treatment and control conditions with pseudocount.
Two scoring approaches:
Compare essentiality scores between wildtype and mutant cell lines:
Query DepMap/literature for known dependencies using PubMed search.
Submit top essential genes to Enrichr for pathway enrichment:
Composite scoring combining:
Query DGIdb for each candidate gene to find existing drugs, interaction types, and sources.
Generate markdown report with:
Key Tools Used:
PubMed_search_articles - Literature search for gene essentiality and drug resistanceReactomeAnalysis_pathway_enrichment - Pathway enrichment (param: identifiers newline-separated, page_size)enrichr_gene_enrichment_analysis - Enrichr enrichment (param: gene_list array, libs array)DGIdb_get_drug_gene_interactions - Drug-gene interactions (param: genes as array)DGIdb_get_gene_druggability - Druggability categoriesSTRING_get_network - Protein interaction networkskegg_search_pathway - Pathway search by keywordkegg_get_pathway_info - Pathway details by IDCancer Context (essential for drug resistance screens):
civic_search_evidence_items - Clinical evidence for drug resistance/sensitivityCOSMIC_get_mutations_by_gene - Somatic mutation landscapecBioPortal_get_mutations - Mutations in specific cancer cohortsChEMBL_search_targets - Structural druggability assessmentExpression & Variant Integration:
GEO_search_rnaseq_datasets / geo_search_datasets - Expression datasetsClinVar_search_variants - Known pathogenic variantsgnomad_get_gene_constraints - Gene constraint metrics (pLI, oe_lof)UniProt_get_function_by_accession - Protein function for hit validationimport pandas as pd
from tooluniverse import ToolUniverse
# 1. Load data
counts, meta = load_sgrna_counts("sgrna_counts.txt")
design = create_design_matrix(['T0_1', 'T0_2', 'T14_1', 'T14_2'],
['baseline', 'baseline', 'treatment', 'treatment'])
# 2. Process
filtered_counts, filtered_mapping = filter_low_count_sgrnas(counts, meta['sgrna_to_gene'])
norm_counts, _ = normalize_counts(filtered_counts)
lfc, _, _ = calculate_lfc(norm_counts, design)
# 3. Score genes
gene_scores = mageck_gene_scoring(lfc, filtered_mapping)
# 4. Enrich pathways
enrichment = enrich_essential_genes(gene_scores, top_n=100)
# 5. Find drug targets
drug_targets = prioritize_drug_targets(gene_scores)
# 6. Generate report
report = generate_crispr_report(gene_scores, enrichment, drug_targets)
Screen hits are statistical findings, not direct readouts of biological relevance. A gene scoring as essential might be essential for cell growth in general (housekeeping) or essential specifically for the phenotype you are screening for (interesting). Always compare your screen hits to public essentiality data — use DepMap pan-cancer dependency scores to filter genes that are broadly essential across all cell lines. A gene essential only in your specific context, but not pan-essential in DepMap, is a better candidate for follow-up than one that scores in every screen.
LOOK UP DON'T GUESS: DepMap dependency scores, known core essential gene sets (Hart et al., Blomen et al.), and DGIdb druggability data for your top hits. Do not assume a hit is context-specific without checking public essentiality databases.
| Evidence Grade | Criteria | Validation Priority | |----------------|----------|---------------------| | A -- Strong hit | MAGeCK RRA p < 0.001, BAGEL BF > 5, >=3 sgRNAs with concordant LFC | Immediate validation (individual KO, growth assay) | | B -- Moderate hit | MAGeCK RRA p < 0.01, BAGEL BF 2-5, >=2 concordant sgRNAs | Secondary validation pool | | C -- Weak/ambiguous | p > 0.01, BF < 2, or discordant sgRNA effects | Deprioritize; check for copy-number bias or seed effects |
Interpreting screen results:
Synthesis questions to address in the report:
scripts/ — load them with from reference_gene_sets import core_essential, nonessential, recovery_rate. recovery_rate(top_depleted_genes) gives the fraction of core-essential genes recovered (a good genome-wide screen recovers >~0.8).Papers differ on how replicate reproducibility is reported: sgRNA-level CPM vs gene-level summed CPM vs gene-level mean CPM. The expected GT is almost always the sgRNA-level Spearman (noisier, lower ρ), not the gene-level aggregate. If you get ρ ≈ 0.6+ you are probably at gene level; drop to per-sgRNA CPM pairs.
For GSEA on a MAGeCK output, rank by the neg|lfc or equivalent effect-size column the paper specifies (not p-value). Check the MAGeCK xlsx for a beta / sgRNA_effect / neg|score column and rank descending.
Reactome pathway names in the .gmt bundle are literal (e.g., "cGMP effects", "Signaling by Hippo"). Answers that should match a Reactome term must reproduce the exact label — do not paraphrase the pathway.
ANALYSIS_DETAILS.md - Detailed code snippets for all 8 phasesUSE_CASES.md - Complete use cases (essentiality screen, synthetic lethality, drug target discovery, expression integration) and best practicesEXAMPLES.md - Example usage and quick referenceQUICK_START.md - Quick start guideFALLBACK_PATCH.md - Fallback patterns for API issuestools
Post-market safety surveillance and recall/adverse-event RETRIEVAL across the full spectrum of FDA-regulated products that are NOT covered by the drug-AE signal skills: medical devices, food / dietary supplements / cosmetics, veterinary drugs, and drug supply (shortages). Orchestrates openFDA endpoints (MAUDE device adverse events + device recalls + 510(k), CAERS food/supplement/ cosmetic adverse events, veterinary adverse events, drug shortages, and cross-product enforcement/recall reports). USE WHEN the user asks: "are there adverse events for [device / pacemaker / infusion pump / insulin pump]", "device recalls for [firm/product]", "supplement / vitamin / cosmetic adverse reactions", "is [drug] in shortage", "what injectables are on shortage", "veterinary / animal adverse events for [drug] in [dog/cat/horse]", "food recall for listeria", "MAUDE report for [device]", "CAERS reactions for [brand]". DO NOT USE for drug adverse-event SIGNAL detection or disproportionality (PRR / ROR / IC) or drug-AE association scoring — that is `tooluniverse-pharmacovigilance` / `tooluniverse-adverse-event-detection`. This skill is multi-product surveillance and retrieval, not drug-AE statistical signal mining.
tools
--- name: tooluniverse-phewas description: Cross-ancestry / cross-biobank phenome-wide association (PheWAS) and replication. Given ONE variant (rsID) or ONE gene, look up every phenotype it associates with across European/UK (UKB-TOPMed), Finnish (FinnGen), Japanese (BioBank Japan), and Taiwanese (TPMI) biobanks, plus exome-wide gene-burden PheWAS (Genebass), then judge whether an association replicates across ancestries or is population-specific. Use whenever the user asks "what else is this va
tools
Dereplicate a putative natural product and assign its chemical taxonomy. Use to answer "is [compound] a known natural product", "what microbe/organism produces [compound]", "what chemical class is [compound]", "dereplicate this metabolite (by formula/exact mass/InChIKey/SMILES)", or "classify this molecule into ChemOnt". Searches NPAtlas for known microbial natural products (producing organism + literature reference), assigns the ChemOnt kingdom→superclass→class→subclass hierarchy via ClassyFire, resolves systematic IUPAC names to structure via OPSIN, and cross-references identity in PubChem. NOT for general drug/compound identity or ADMET (use tooluniverse-chemical-compound-retrieval / tooluniverse-small-molecule-discovery) and NOT for metabolomics pathway/enrichment analysis (use tooluniverse-metabolomics skills).
tools
Genome-ASSEMBLY discovery, QC, and replicon mapping for any organism (bacteria, archaea, fungi, and beyond) using NCBI Datasets. Resolves an organism name or taxid to assemblies, picks the reference/representative or best-quality assembly, pulls assembly QC metrics (total length, contig/scaffold N50, contig count, GC%, assembly level, RefSeq category), enumerates chromosomes and plasmids via per-replicon sequence reports, and compares candidate assemblies on quality. Use for "what genomes are available for [organism]", "assembly stats / N50 / GC content for [GCF_/GCA_ accession]", "how many plasmids does [strain] have", "compare assemblies for [species]", "find the reference genome for [taxon]", "is this assembly Complete Genome or just contigs". NOT for gene-level orthology/synteny (use tooluniverse-comparative-genomics), plant gene structure (use tooluniverse-plant-genomics), de novo assembly from raw reads (no tool exists), or taxonomy-only name/lineage lookups.