crispr-screens/in-vivo-screens/SKILL.md
Designs and analyzes in vivo CRISPR screens in animal tumor models, organoids, and immune-cell adoptive transfers. Covers bottleneck math (250x cells/sgRNA requires ~25M cells implanted; impossible for most syngeneic models, forcing focused libraries), focused library design (Manguso 2017 Nature 547:413 immune screen; Chen 2015 tumor screens), CRISPR-StAR intrinsic-control screening (Uijttewaal 2025 Nat Biotechnol 43:1848), clonal-dynamics-limited detection, tumor-explant DNA recovery, syngeneic vs xenograft vs PDX considerations, and the relationship to downstream MAGeCK / drugZ analysis. Use when designing in vivo CRISPR screens for tumor / immune / metastasis biology, choosing focused vs genome-wide for animal models, addressing bottleneck-induced clonal collapse, picking the syngeneic / xenograft / PDX model, integrating in vivo with in vitro results, or applying CRISPR-StAR for animal experiments.
npx skillsauth add GPTomics/bioSkills bio-crispr-screens-in-vivo-screensInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference examples tested with: MAGeCK 0.5.9+, MAGeCK-VISPR 0.5.6+, pandas 2.2+, numpy 1.26+.
Before using code patterns, verify installed versions match. If versions differ:
mageck --versionIf code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
"Design or analyze an in vivo CRISPR screen" -> Account for the dramatic bottleneck during animal implantation and tumor growth; use focused libraries; recover DNA from tumor explants; analyze with bottleneck-adjusted hit calling.
mageck count + mageck test for standard analysisWhy in vivo screens differ from in vitro:
| Constraint | In vitro | In vivo | |------------|----------|---------| | Cells per condition | 10M-100M (unlimited) | Limited by injection volume (1-5M cells typical) | | Implant -> early tumor cell count | N/A | 10-100x drop typical (~94% library complexity may survive in CD45+ TILs) | | Late tumor cell count | N/A | Further 5-10x reduction; final ~3.93 sgRNAs/gene | | Bottleneck per animal | None | Tens of millions of cells fail to engraft | | Library coverage achievable | 500-1000x | Often 50-100x effective at endpoint | | sgRNAs survivable | Full library | 80-94% in early tumors; 60-80% in late tumors |
Math: A 70,000-sgRNA library at 500x coverage requires 35M cells in pool. Most syngeneic models can implant 1-5M cells. Result: real coverage is 70x at best; effective coverage at endpoint is even lower after bottleneck.
Solution: Use focused libraries (500-3,000 genes; ~3,000-15,000 sgRNAs) to maintain reasonable coverage despite the bottleneck.
Manguso et al 2017 Nature 547:413 established the canonical in vivo CRISPR screen methodology with a focused library:
Standard focused-library principles:
Public focused libraries:
Uijttewaal et al 2025 Nat Biotechnol 43(11):1848 (published online Dec 2024) introduced CRISPR-StAR (Staggered Activation Reporter), which uses inducible Cas9 expression to delay gene editing until after cells have engrafted in the animal.
How it works:
Quantified gain: CRISPR-StAR enables genome-scale in vivo screens (vs focused libraries) by generating intrinsic per-clone controls; outperforms conventional in vivo screens in therapy-resistant mouse melanoma models (Uijttewaal 2025).
| Model | Immune system | Use case | |-------|---------------|----------| | Syngeneic (e.g., B16 melanoma in C57BL/6) | Intact mouse immunity | Tumor-immune interaction; checkpoint biology | | Xenograft (human cancer line in NSG) | Absent / impaired | Tumor cell-intrinsic biology; drug response in human cells | | PDX (patient-derived xenograft) | Absent / impaired | Patient-specific biology; therapy testing | | Humanized mouse | Reconstituted human immunity | Tumor-immune in human context (limited) | | Organoid in vivo | None (in vitro) | Tumor cell-intrinsic in 3D structure |
Decision rule: For immune-targeting drug screens, use syngeneic. For human-cancer cell-intrinsic biology, use xenograft. For patient-specific drug screens, use PDX. Each requires different cell numbers and bottleneck planning.
Goal: Recover sufficient sgRNA-containing DNA from tumor explants for sequencing.
Approach: Dissect tumor; lyse with proteinase K; extract genomic DNA; amplify the sgRNA locus by PCR; sequence on MiSeq / NextSeq / NovaSeq.
# Typical PCR + sequencing parameters for in vivo screens
# Per-tumor DNA: 0.5-5 mg yield from typical syngeneic tumor
# Per-sample sequencing depth: ≥500 reads/sgRNA at endpoint (lower than in vitro 300+)
# Multiple animals per condition (n=5-10) to account for clonal variation
# mageck count for in vivo
mageck count \
--list-seq library.csv \
--sample-label Plasmid,Animal1,Animal2,Animal3,Animal4,Animal5 \
--fastq Plasmid.fq.gz A1.fq.gz A2.fq.gz A3.fq.gz A4.fq.gz A5.fq.gz \
--norm-method median \
--output-prefix in_vivo_screen
Goal: Identify per-gene fitness effects despite high inter-animal variability.
Approach: Each animal is a "replicate" with high variance due to clonal dynamics. Use MAGeCK MLE with animal-as-batch covariate, or run MAGeCK RRA per animal and meta-analyze.
# Option A: MAGeCK MLE with batch covariate
cat > in_vivo_design.txt <<EOF
Samples baseline tumor animal_2 animal_3 animal_4 animal_5
Plasmid 1 0 0 0 0 0
Animal1 1 1 0 0 0 0
Animal2 1 1 1 0 0 0
Animal3 1 1 0 1 0 0
Animal4 1 1 0 0 1 0
Animal5 1 1 0 0 0 1
EOF
mageck mle \
--count-table in_vivo_screen.count.txt \
--design-matrix in_vivo_design.txt \
--output-prefix in_vivo_mle
Per-animal RRA + meta-analysis:
import pandas as pd
# Run mageck test on each animal vs plasmid
# Combine with Stouffer's Z method
from scipy.stats import norm
def meta_analyze_animals(per_animal_results):
'''per_animal_results: list of MAGeCK gene_summary.txt per animal.'''
merged = pd.concat([df.assign(animal=i) for i, df in enumerate(per_animal_results)])
grouped = merged.groupby('id')
meta = grouped.apply(lambda g: pd.Series({
'mean_neg_score': g['neg|score'].mean(),
'stouffer_z': norm.ppf(g['neg|p-value']).sum() / (len(g) ** 0.5),
'animals_significant': (g['neg|fdr'] < 0.05).sum(),
'n_animals': len(g)
}))
return meta.sort_values('stouffer_z')
Trigger: Implanted cells lack sufficient library complexity; a few clones dominate the tumor. Mechanism: Inter-animal stochasticity in cell engraftment creates founder effects. Symptom: Per-animal hit lists vary dramatically; no genes appear across all animals. Fix: Use focused library to maintain coverage; increase animals per condition (n=10+); use CRISPR-StAR to delay bottleneck.
Trigger: Wrong library or library not amplified well from tumor DNA. Mechanism: PCR primers don't match the sgRNA flanking; or insufficient DNA template. Symptom: Low mapping rate (<10%); few sgRNAs detected per tumor. Fix: Verify library plasmid sequence; design primers specific to lentiviral cassette; use 10-100 ng input DNA + 25 PCR cycles.
Trigger: Tumor biology differs from in vitro CEGv2 calibration; not all essentials are essential in animal context. Mechanism: Cells in vivo have different growth conditions (nutrients, hypoxia, immune pressure) than in vitro; CEGv2 calibration assumes in vitro context. Symptom: CEGv2 PR-AUC <0.5 in vivo despite high in vitro PR-AUC. Fix: Use cell-type-and-context-specific essentialome (e.g., a corresponding in vitro screen of the same cell type) as a baseline; in vivo essentialome is biology-dependent.
Trigger: Cas9-positive cells were not selected before implantation; library has Cas9-negative escapers. Mechanism: Cas9-negative cells carry sgRNA but no editing; persist in tumor without biological perturbation. Symptom: Specific essentiality signals weak; PR-AUC low. Fix: Always select Cas9-positive cells (FACS or selection) before infection; verify by Cas9 IHC or flow.
Trigger: Limited animals per condition (n=3-5); each has high variance. Mechanism: Per-animal clonal dynamics produce different sgRNA distributions; no consistent signal across few animals. Symptom: MAGeCK p-values inflated; FDR uncalibrated. Fix: Increase animals per condition to 10+; use meta-analysis across animals (Stouffer); validate top hits in arrayed format with n=10 mice each.
Trigger: Spontaneously arising mutations in some tumor regions create non-clonal heterogeneity. Mechanism: Tumor heterogeneity is genuine biology; not all cells in tumor are descendants of original engrafted cells. Symptom: Per-region sequencing shows different sgRNA distributions within same tumor. Fix: Sample multiple tumor regions; or use whole-tumor genomic DNA pooling (averages out heterogeneity).
| Threshold | Value | Source / Rationale | |-----------|-------|--------------------| | Cells per animal | 1-5M typical for syngeneic; 5-10M for xenograft | Tumor model dependent | | sgRNAs per gene in library (focused) | 4-6 | Standard convention | | Library size for in vivo focused | 3,000-15,000 sgRNAs | Maintainable coverage | | Coverage at endpoint | ≥50x, ideally 100-200x | Lower than in vitro 500x | | Animals per condition | 10+ for hit-calling; 5 minimum | Inter-animal variability | | Animals per condition for arrayed validation | 10 | Tighter signal needed | | In vivo CEGv2 PR-AUC | >0.4 (context-dependent) | Lower than in vitro 0.7 | | Late tumor sgRNA-per-gene | ~3.93 typical | Empirical from literature | | Days to harvest (tumor) | 12-21 days post-implant | Time for selection to manifest |
| Error / symptom | Cause | Solution | |-----------------|-------|----------| | No hits | Library complexity collapsed | Use focused library or CRISPR-StAR | | Per-animal hit lists differ | Clonal dominance | Use focused library; increase animals | | Low CEGv2 PR-AUC | Context-specific essentialome | Use in vivo-specific reference set | | Low mapping rate | Wrong sequencing primers | Verify library lentiviral architecture | | Coverage at endpoint <50x | Implantation bottleneck | Increase cells implanted; focused library |
tools
--- name: bio-phasing-imputation-foundations description: Frames the phasing/imputation pipeline before any tool runs: phasing and imputation are one Li-Stephens copying HMM (recombination is the transition, mutation the emission, the genetic map and Ne set the rates), imputation's honest output is a dosage with a self-estimated quality (INFO/R2/DR2) not a hard genotype, and the stages are ordered and each fails silently (QC, align build and strand to the panel, phase, impute per chromosome, fil
tools
Chooses the enrichment generation before any tool runs, mapping the input shape to a method class - a pre-selected gene list plus a background to over-representation analysis (ORA, hypergeometric), a ranked statistic for all genes to gene set enrichment (GSEA), a signed signaling topology to pathway-topology (SPIA) - then making the null explicit (competitive vs self-contained, gene vs subject sampling) and running a trustworthiness checklist (testable-gene universe, FDR, redundancy collapse, leading-edge check, version reporting). Covers why every clusterProfiler GSEA is the inter-gene-correlation-uncorrected competitive null, why the background not the gene list decides ORA significance, and why no method is universally best. Use when deciding ORA vs GSEA vs topology, which gene-set DB, whether a result is trustworthy, or which null a tool computes. For ORA see go-enrichment, GSEA see gsea, databases kegg-pathways/reactome-pathways/wikipathways; the ranking comes from differential-expression/de-results.
testing
End-to-end GWAS workflow from VCF to association results. Covers PLINK QC, population structure correction, and association testing for case-control or quantitative traits. Use when running genome-wide association studies.
development
Orchestrates the full path from differential expression results to redundancy-collapsed functional enrichment: choose ORA vs GSEA, convert gene IDs per method, run enrichGO/enrichKEGG/enrichPathway/enrichWP or gseGO/gseKEGG (clusterProfiler, ReactomePA, rWikiPathways), and visualize. Routes the ORA-vs-GSEA generation fork and the null/universe/reproducibility theory to pathway-analysis/enrichment-foundations. Use when a DESeq2/edgeR/limma result must become enriched GO terms, KEGG/Reactome/WikiPathways pathways, or a GSEA leading edge; when deciding whether a ranking exists for all genes (GSEA, named decreasing vector) or only a pre-selected list (ORA plus a defensible background universe); or when assembling DE-to-pathway end to end. The DE list and ranking statistic come from differential-expression/de-results; per-method nuance lives in the pathway-analysis skills.