crispr-screens/combinatorial-screens/SKILL.md
Designs and analyzes combinatorial CRISPR screens covering paired-Cas9 (Big Papi, Najm 2018), enhanced AsCas12a multiplex (enCas12a, DeWeirdt 2021), in4mer 4-guide-array Cas12a (Esmaeili Anvar N et al 2024 Nat Commun 15:3577) and the Inzolia paralog-pair library, paralog-buffering detection (Dede 2020 Genome Biol; Thompson 2021 Cell Reports 36:109597), genetic-interaction (GI) scoring as observed_double_LFC minus expected_additive_double_LFC, synthetic-lethal and synthetic-rescue interaction interpretation, the half-of-essentiality buffered by paralogs phenomenon, multiplex screen statistical analysis with MAGeCK MLE interaction terms, and the relationship to single-cell combinatorial Perturb-seq. Use when designing a paralog or pathway-pair screen, choosing between paired-Cas9 (Big Papi) and Cas12a multiplex (Inzolia), interpreting genetic interaction scores, identifying synthetic-lethal targets for drug development, or scaling beyond single-gene CRISPR screens.
npx skillsauth add GPTomics/bioSkills bio-crispr-screens-combinatorial-screensInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference examples tested with: MAGeCK 0.5.9+ (for MLE with interaction terms), Inzolia library annotation (Bayle 2024), pandas 2.2+, numpy 1.26+, scipy 1.12+, matplotlib 3.8+.
Before using code patterns, verify installed versions match. If versions differ:
mageck --version; mageck mle --helpIf code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
"Run a combinatorial CRISPR screen to find synthetic-lethal interactions" -> Design a paired or multiplex library, screen for double-knockout fitness, score per-pair genetic interaction (GI = observed_double - expected_additive), and identify synthetic-lethal (negative GI) and synthetic-rescue (positive GI) interactions.
mageck mle with explicit interaction terms for paired-Cas9 (Big Papi-style)| Goal | Architecture | Library | Why | |------|--------------|---------|-----| | Paralog buffering, identify synthetic lethal paralog pairs | enCas12a single-array 4-guide multiplex | Inzolia (Bayle 2024) | Cas9 single-KO misses ~half of paralog-buffered essentials | | Test specific pathway pair (e.g., DNA repair branches) | Big Papi (paired-Cas9 dual sgRNA cassette) | Custom | Mature methodology; SpCas9 well-characterized | | Combinatorial 3-way / 4-way knockout | in4mer (4-guide single Cas12a array) | Custom (in4mer) | Single transcript processed by Cas12a; multi-gene | | Single-cell Perturb-seq with multi-pert per cell | Combinatorial Perturb-seq + Cas9 multiplex | Custom | Single-cell readout of multi-perturbation effects | | Drug-modifier + KO interaction | Cas9 KO + drug treatment | Standard libraries | Drug as second "perturbation" |
Fails when:
| Property | Cas9 paired (Big Papi) | Cas12a multiplex (in4mer / Inzolia) | |----------|------------------------|--------------------------------------| | Multiplex capacity per cassette | 2 sgRNAs (paired) | 4 (in4mer); 2 (standard Cas12a) | | sgRNA processing | Two separate U6 promoters | Single transcript processed by Cas12a itself | | sgRNA inhibition with multiple targets | None | None (Cas12a's intrinsic processing handles all) | | Library size for 1,000 pairs | ~2,000 paired cassettes | ~250 4-guide cassettes (in4mer) | | Validated libraries | Limited (mostly custom) | Inzolia: 18k 4-guide arrays for 4k paralog pairs | | Per-perturbation editing efficiency | High (each sgRNA independently) | Variable (Cas12a less efficient on some targets) | | Best for | Pairwise GI of specific interest | Genome-scale paralog buffering; multi-gene perturbation |
Recommendation: For modern paralog screens, use Cas12a multiplex with the Inzolia library. The 30% library-size reduction vs paired-Cas9 makes it more cost-effective for genome-scale.
Dede et al 2020 Genome Biol 21:262; Thompson et al 2021 demonstrated that approximately half of constitutively-expressed essential genes are never detected in Cas9 single-KO screens. The reason: gene paralogs perform redundant essential functions. Loss of one paralog is buffered by the other; only loss of both creates the essentiality phenotype.
Quantified impact: ~24+ synthetic-lethal paralog pairs identified in Dede 2020 across 3 cell lines; 79% reproduce in ≥2 lines, 58% in all 3. These pairs were not findable by single-gene Cas9 screens, requiring combinatorial methodology.
Examples:
Each is buffered: loss of one is tolerated; loss of both is lethal.
Goal: Identify pairs where the double-knockout fitness differs from the additive expectation.
Approach: From per-pair and per-singleton fitness data, compute GI = observed_double_LFC - (single_A_LFC + single_B_LFC). Synthetic lethal: GI < threshold (more depleted than additive). Synthetic rescue: GI > threshold (less depleted than additive).
import pandas as pd
import numpy as np
from scipy.stats import zscore
def gi_score(paired_lfc_df, single_lfc_df):
'''Score genetic interactions from paired vs single LFCs.
paired_lfc_df: rows = paired-KO; columns = ['gene_A', 'gene_B', 'paired_lfc']
single_lfc_df: rows = single-KO; columns = ['gene', 'single_lfc']
'''
single = dict(zip(single_lfc_df['gene'], single_lfc_df['single_lfc']))
df = paired_lfc_df.copy()
df['single_A_lfc'] = df['gene_A'].map(single)
df['single_B_lfc'] = df['gene_B'].map(single)
df['expected_additive'] = df['single_A_lfc'] + df['single_B_lfc']
df['gi_score'] = df['paired_lfc'] - df['expected_additive']
df['gi_z'] = zscore(df['gi_score'])
df['gi_class'] = np.where(df['gi_z'] < -2, 'synthetic_lethal',
np.where(df['gi_z'] > 2, 'synthetic_rescue', 'no_interaction'))
return df.sort_values('gi_z')
Interpretation:
Goal: Use MAGeCK MLE to estimate the effect of each gene independently and the additional effect when both genes are simultaneously perturbed.
Approach: Design matrix encodes single-A, single-B, double-AB conditions; the interaction column is set to 1 only for double-KO samples. The resulting beta for that column captures the extra effect beyond the sum of single-gene betas. Note: MAGeCK MLE does not natively perform a formal interaction-significance test, but the interaction|beta and |fdr columns serve as the GI estimate; for formal interaction testing, compute GI = observed_double_lfc - (single_A_lfc + single_B_lfc) explicitly (see GI scoring section below).
# Design matrix encoding double-KO as a separate "interaction" indicator
# Conditions: NT (control), A_KO, B_KO, A_B_KO
cat > combo_design.txt <<EOF
Samples baseline geneA geneB interaction
NT_r1 1 0 0 0
NT_r2 1 0 0 0
A_r1 1 1 0 0
A_r2 1 1 0 0
B_r1 1 0 1 0
B_r2 1 0 1 0
AB_r1 1 1 1 1
AB_r2 1 1 1 1
EOF
mageck mle \
--count-table combo_counts.txt \
--design-matrix combo_design.txt \
--output-prefix combo_mle
# Output: per-gene beta scores per design column
# The "interaction" column beta captures additional joint effect beyond additive
Interpretation of MAGeCK MLE output:
| Column | Meaning |
|--------|---------|
| geneA|beta | Single-A effect |
| geneB|beta | Single-B effect |
| interaction|beta | Additional effect under joint perturbation beyond sum of singles |
| interaction|p-value, |fdr | Significance vs zero |
A significantly negative interaction|beta is synthetic lethal; positive is synthetic rescue. For formal GI hypothesis testing, prefer the explicit GI scoring approach (next section) over MAGeCK MLE interpretation, since MAGeCK MLE does not validate the additive null.
Bayle 2024 Nat Commun 15:3375 introduced in4mer, a Cas12a multiplex library where each cassette contains 4 guides processed by Cas12a's intrinsic crRNA-processing activity. The Inzolia library is the canonical implementation covering ~4,000 paralog pairs.
Library design:
# Per-pair analysis from in4mer screen
def in4mer_pair_analysis(paired_counts_df, gene_pairs):
'''Aggregate cassette-level counts to per-pair LFCs.
paired_counts_df: rows = cassettes; columns = sample counts.
gene_pairs: DataFrame with cassette_id -> [gene_A, gene_B, gene_C, gene_D]
'''
# Aggregate by (gene_A, gene_B) pair if 2-gene pair, or by all 4 genes if multi-gene
pair_lfc = paired_counts_df.merge(gene_pairs, on='cassette_id').groupby(['gene_A', 'gene_B'])
return pair_lfc.agg(['mean', 'std', 'count'])
Trigger: Improperly designed dual sgRNA cassette where two sgRNAs are read as one fused sequence. Mechanism: Without spacer or terminator between sgRNAs, transcription doesn't terminate correctly. Symptom: Cassette appears as single perturbation (the first sgRNA dominates); GI scoring fails. Fix: Use validated Big Papi or paired-sgRNA cloning protocols; verify by amplicon sequencing of clones.
Trigger: Cas12a less efficient than Cas9 at some loci; some guides in the 4-guide array don't cut. Mechanism: Cas12a editing rate varies by sequence context; some loci edit at <30%. Symptom: Specific pairs missing expected effects despite cassette presence. Fix: Pilot Cas12a efficiency at the loci before full screen; use enCas12a (enhanced) variant; for known low-efficiency loci, supplement with Cas9.
Trigger: Library lacks single-gene controls (only paired knockouts). Mechanism: GI = paired - expected_additive requires single-gene LFC; without them, expected cannot be computed. Symptom: Cannot score GI; only paired LFCs available. Fix: Design library to include singletons (place gene A with 3 placeholder guides; gene B with 3 placeholders); re-run with full design.
Trigger: Using public single-gene LFCs (e.g., DepMap) as the baseline for paired-screen GI scoring. Mechanism: Single-gene effects are cell-line specific; using HCT116 single-gene LFCs to score K562 paired-screen GIs is invalid. Symptom: GI scores look noisy; many false positives. Fix: Include singleton controls in the screen; or use cell-line-matched DepMap data.
Trigger: Paired KO of two cell-cycle-impacting genes; the double-effect saturates cell cycle. Mechanism: If A_KO causes 50% growth arrest and B_KO causes 50%, the combined 75% arrest is already saturating proliferation; additive expectation overestimates double-effect, generating false "synthetic-rescue." Symptom: GI scores positive for pairs of essential cell-cycle genes; biologically unexpected. Fix: Use log-space (LFC) GI scoring rather than linear; saturation is less severe in log-space. Alternative: model with logistic / saturable response curve.
Trigger: Inzolia library has uneven cassette representation; some pairs at 10x lower coverage than others. Mechanism: Standard library QC (Gini, skew) applies; low-coverage cassettes yield noisier LFCs. Symptom: GI z-scores vary 2-3x across cassettes targeting the same pair. Fix: Standard library QC; for low-coverage pairs, aggregate fewer cassettes but with more sequencing depth; or drop low-coverage pairs from analysis.
For high-stakes synthetic-lethal hits (drug-target nomination), validate by:
| Threshold | Value | Source / Rationale | |-----------|-------|--------------------| | Synthetic lethal GI z-score | <-2 | Standard convention | | Synthetic rescue GI z-score | >2 | Standard convention | | No interaction | -1 to +1 | Within additive expectation | | Cas9 paired-screen cassette count per pair | 4-6 | Standard library convention | | Cas12a 4-guide arrays per pair (Inzolia) | 4 | Bayle 2024 | | Singletons in combinatorial library | At least 4-6 per single gene | For stable expected_additive | | Cells per cassette for stable GI | 500+ at infection | Standard pooled-screen coverage | | Cas12a editing efficiency for inclusion | >50% | Below = unreliable signal |
| Error / symptom | Cause | Solution | |-----------------|-------|----------| | Big Papi cassette acts as single | Fused sgRNAs in cassette | Re-derive validated paired-sgRNA protocol | | Cas12a low editing | Locus-specific inefficiency | Pilot loci first; use enCas12a | | Cannot compute GI | No singletons in library | Re-design to include all-singletons | | GI scores noisy | Library skew | Standard library QC; aggregate cassettes | | Many false "rescue" GIs | Saturation in linear-space | Use log-space (LFC) GI scoring | | Drug-target paralog shows no GI in screen | Cell-line-specific buffering | Cross-validate with multiple lines |
tools
--- name: bio-phasing-imputation-foundations description: Frames the phasing/imputation pipeline before any tool runs: phasing and imputation are one Li-Stephens copying HMM (recombination is the transition, mutation the emission, the genetic map and Ne set the rates), imputation's honest output is a dosage with a self-estimated quality (INFO/R2/DR2) not a hard genotype, and the stages are ordered and each fails silently (QC, align build and strand to the panel, phase, impute per chromosome, fil
tools
Chooses the enrichment generation before any tool runs, mapping the input shape to a method class - a pre-selected gene list plus a background to over-representation analysis (ORA, hypergeometric), a ranked statistic for all genes to gene set enrichment (GSEA), a signed signaling topology to pathway-topology (SPIA) - then making the null explicit (competitive vs self-contained, gene vs subject sampling) and running a trustworthiness checklist (testable-gene universe, FDR, redundancy collapse, leading-edge check, version reporting). Covers why every clusterProfiler GSEA is the inter-gene-correlation-uncorrected competitive null, why the background not the gene list decides ORA significance, and why no method is universally best. Use when deciding ORA vs GSEA vs topology, which gene-set DB, whether a result is trustworthy, or which null a tool computes. For ORA see go-enrichment, GSEA see gsea, databases kegg-pathways/reactome-pathways/wikipathways; the ranking comes from differential-expression/de-results.
testing
End-to-end GWAS workflow from VCF to association results. Covers PLINK QC, population structure correction, and association testing for case-control or quantitative traits. Use when running genome-wide association studies.
development
Orchestrates the full path from differential expression results to redundancy-collapsed functional enrichment: choose ORA vs GSEA, convert gene IDs per method, run enrichGO/enrichKEGG/enrichPathway/enrichWP or gseGO/gseKEGG (clusterProfiler, ReactomePA, rWikiPathways), and visualize. Routes the ORA-vs-GSEA generation fork and the null/universe/reproducibility theory to pathway-analysis/enrichment-foundations. Use when a DESeq2/edgeR/limma result must become enriched GO terms, KEGG/Reactome/WikiPathways pathways, or a GSEA leading edge; when deciding whether a ranking exists for all genes (GSEA, named decreasing vector) or only a pre-selected list (ORA plus a defensible background universe); or when assembling DE-to-pathway end to end. The DE list and ranking statistic come from differential-expression/de-results; per-method nuance lives in the pathway-analysis skills.