crispr-screens/drugz-chemogenomic/SKILL.md
Analyzes CRISPR drug-modifier (chemogenomic) screens with drugZ (Li & Hart 2019 Genome Med), a bidirectional Z-score method that identifies synthetic-lethal sensitizing genes and resistance-conferring suppressor genes from vehicle vs drug comparisons. Covers vehicle-anchored design (not Day-0), the bidirectional Z math giving 2-3x sensitivity over MAGeCK / STARS / edgeR / RIGER on drug screens, per-gene sumZ and normZ, synth (sensitizer) vs supp (suppressor) FDR, multi-dose handling, integration with control sgRNAs, and comparison with MAGeCK MLE with dose covariate. Use when running a drug-modifier CRISPR screen, identifying sensitizing or resistance genes for a drug candidate, choosing drugZ vs MAGeCK MLE for chemogenomic analysis, troubleshooting low-effect drug screens where MAGeCK lacks sensitivity, or designing a drug-screen layout (vehicle vs drug arms).
npx skillsauth add GPTomics/bioSkills bio-crispr-screens-drugz-chemogenomicInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference examples tested with: drugZ Aug-2019+ (hart-lab/drugz; Python 3.6+), MAGeCK 0.5.9+, pandas 2.2+, numpy 1.26+, scipy 1.12+, statsmodels 0.14+, matplotlib 3.8+.
Before using code patterns, verify installed versions match. If versions differ:
drugz --version; python drugz.py --helpgit clone https://github.com/hart-lab/drugzIf code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
"Identify genes that sensitize or confer resistance to my drug in a CRISPR screen" -> Compare drug-treated vs vehicle-treated arms (NOT Day-0 baseline) using bidirectional Z-scores per sgRNA, sum to per-gene normalized Z, and rank genes for sensitizer (synthetic lethal) vs suppressor (resistance) phenotype.
python drugz.py -i counts.txt -o drugz.txt -c Vehicle_r1,Vehicle_r2 -x Drug_r1,Drug_r2drugz.drugz_analysis() (internal Python module)| Property | drugZ | MAGeCK RRA | MAGeCK MLE | |----------|-------|------------|-------------| | Bidirectional sensitivity | YES (sensitizer + resistance same scale) | Asymmetric (neg/pos separately) | Asymmetric | | Drug-anchored baseline | YES (drug vs vehicle) | Either (drug vs vehicle or vs Day 0) | Either | | Sensitivity to small effects | 2-3x higher (Li & Hart 2019 benchmark) | Lower | Lower | | Statistical framework | Z-score from per-sgRNA NB residuals | NB + alpha-RRA | NB GLM with design matrix | | Handles guide-level noise | sgRNA-level z aggregation | Rank-based aggregation | Built-in guide-efficacy term (optional) | | Best for | Drug-modifier / chemogenomic screens | General essentiality / standard 2-condition | Time course / multi-condition |
Why MAGeCK is suboptimal for drug screens: MAGeCK's RRA was designed for two-condition essentiality; drug-vs-vehicle screens often have small effect sizes (10-30% sgRNA shift) that RRA rank-based aggregation under-detects. drugZ uses parametric Z-scoring tuned for these small effects.
Quantified gain (Li & Hart 2019): On DNA damage response chemogenomic screens, drugZ identified 2-3x more hits than STARS, MAGeCK, edgeR, or RIGER at the same FDR threshold; the additional hits were enriched in the expected pathway (DDR).
LFC_drug_vs_vehZ = (LFC - median(LFC)) / MAD(LFC)sumZ = sum(Z_sgRNA)normZ = sumZ / sqrt(N_sgrna)Critical: Vehicle vs drug, NOT Day 0 vs drug. Day-0 baseline conflates proliferation effects with drug effects.
Goal: Quantify per-gene sensitizing and suppressor effects from a chemogenomic screen.
Approach: Run drugz.py with vehicle and drug sample columns; output per-gene sumZ, normZ, and direction-specific p-values + FDR.
git clone https://github.com/hart-lab/drugz
cd drugz
# Standard drug screen comparison:
# Vehicle (DMSO or carrier) replicates: Veh_r1, Veh_r2, Veh_r3
# Drug-treated replicates: Drug_r1, Drug_r2, Drug_r3
python drugz.py \
-i counts.txt \ # input read-count file (tab-separated)
-o drugz_output.txt \ # output file
-c Veh_r1,Veh_r2,Veh_r3 \ # control samples (comma-separated)
-x Drug_r1,Drug_r2,Drug_r3 \ # treated samples (comma-separated)
-r control_genes.txt \ # OPTIONAL: genes to exclude
-p 5 # pseudocount (default 5)
# Output: drugz_output.txt with columns:
# GENE, numObs, sumZ, normZ, pval_synth, rank_synth, fdr_synth, pval_supp, rank_supp, fdr_supp
Output columns:
| Column | Meaning |
|--------|---------|
| GENE | Gene symbol |
| numObs | Number of sgRNAs contributing |
| sumZ | Summed per-sgRNA Z-score |
| normZ | Normalized Z = sumZ / sqrt(N) |
| pval_synth | One-sided p-value for sensitizer (negative effect; gene KO sensitizes to drug) |
| rank_synth | Rank for sensitizers |
| fdr_synth | BH-corrected FDR for sensitizers |
| pval_supp | One-sided p-value for suppressor (positive effect; gene KO confers resistance) |
| rank_supp | Rank for suppressors |
| fdr_supp | BH-corrected FDR for suppressors |
Interpretation:
fdr_synth < 0.05 -- loss of these genes makes cells more sensitive to drug. Examples: PARPi targets BRCA1/2; cisplatin sensitizes ERCC.fdr_supp < 0.05 -- loss of these genes confers resistance. Examples: drug-efflux genes; drug target itself paradoxically.Why this matters: Drug screen analysis can compare drug to:
counts at Day 0 (no perturbation; cloning baseline)
|
v
counts at Day 7 - Vehicle (proliferation only; what survives in normal culture)
counts at Day 7 - Drug (proliferation + drug effect)
|
v
Drug effect = LFC(Drug vs Vehicle) # CORRECT
Wrong: LFC(Drug vs Day 0) # confounds drug with general proliferation
drugZ specifically requires --control-samples to be vehicle samples. Always include matched vehicle controls in drug screens.
drugZ for dose-response: Not natively designed for dose; instead, run drugZ separately at each dose vs vehicle, then look for genes with consistent direction across doses.
for DOSE in low mid high; do
python drugz.py \
-i counts.txt \
-o drugz_${DOSE}.txt \
-c Veh_r1,Veh_r2 \
-x Drug${DOSE}_r1,Drug${DOSE}_r2
done
# Then aggregate: genes significant at high dose AND consistent direction at mid/low dose
For multi-condition drug-screens (time × drug × cell-line), use MAGeCK MLE with explicit design matrix instead -- MLE handles multi-factorial; drugZ does not.
Goal: When to use each method.
| Question | drugZ | MAGeCK MLE | |----------|-------|-------------| | Single drug, single dose, vehicle vs drug | YES (preferred) | Acceptable | | Multiple doses, drug response curve | Per-dose drugZ + meta | YES (preferred with dose covariate) | | Time course at single dose | Per-timepoint drugZ + meta | YES (preferred with time covariate) | | Drug + cell-line panel | Per-line drugZ + meta | YES (or Chronos) | | Combinatorial drug pairs | Per-pair drugZ + meta | YES (preferred with interaction) | | Synergy / antagonism detection | Limited (per-drug calling only) | YES (interaction term in MLE) | | Small effect sizes (LFC <0.5) | Highest sensitivity | Lower sensitivity | | Heavy selection (>40% guides change) | OK | Norm needs control sgRNAs |
Reconciliation: For simple drug-modifier screens with one drug and one vehicle, run both drugZ and MAGeCK MLE; hits called by both are high confidence; drugZ-only hits at low LFC need orthogonal validation (drug + arrayed validation).
Goal: Exclude reference essential or control genes from the Z-score null distribution.
Approach: Provide -r with a file listing gene symbols whose sgRNA-level Z scores should not influence the null. Useful when CEGv2 essentials would otherwise inflate the null distribution.
# Pass a file with one gene per line
cat > remove_essential.txt <<EOF
RPS3
RPL11
EIF3A
POLR2A
CDK1
EOF
python drugz.py \
-i counts.txt \
-o drugz_clean.txt \
-c Veh_r1,Veh_r2 \
-x Drug_r1,Drug_r2 \
-r remove_essential.txt
When to use: If pilot drugZ runs show many essential genes appearing as "sensitizers" purely because they drop out under any condition, removing them gives a cleaner drug-specific signal.
Trigger: Comparing drug vs Day-0 instead of drug vs vehicle.
Mechanism: Day-0 comparison conflates drug effect with normal-culture proliferation; essential genes drop in both conditions, masking drug-specific sensitization.
Symptom: PARPi screen shows no sensitization at BRCA1/BRCA2 despite expected biology.
Fix: Re-run with vehicle samples as --control-samples. The drug-vs-vehicle is the canonical comparison.
Trigger: Essential genes drop out in both vehicle and drug arms; small relative shift gives misleadingly high Z.
Mechanism: drugZ's Z-score is symmetric; essential genes drop in both arms but slightly more in drug -> "synthetic lethal" call.
Symptom: Hit list dominated by RPS, RPL, EIF essentials.
Fix: Use --remove-genes-file to exclude CEGv2 essentials; or filter the output post-hoc.
Trigger: Insufficient sgRNAs per gene; small effect sizes. Mechanism: drugZ's per-gene sumZ depends on enough sgRNAs to be stable; with 3-4 sgRNAs/gene, single-guide noise drives variation. Symptom: Same data produces different top hits across repeated runs. Fix: Use a 6+ sgRNAs/gene library (Avana, Dolcetto); or aggregate multiple drugZ runs with different bootstrap seeds; or use MAGeCK MLE for stability.
Trigger: Multi-dose screen analyzed at highest dose only. Mechanism: drugZ doesn't model dose; running at one dose loses the dose-response information. Symptom: Hits at high dose may be dose-specific (not true responders). Fix: Run drugZ at each dose; require consistency across doses for high-confidence hits.
Trigger: Loss of drug target reduces drug binding, increasing drug resistance. Mechanism: Real biology -- drug target itself is a resistance gene from a KO perspective. Symptom: Drug-target gene like PARP1 appears in suppressor list for PARPi screen. Fix: Expected biology. Annotate the drug target separately. The suppressor list is correct.
| Threshold | Value | Source / Rationale |
|-----------|-------|--------------------|
| Sensitizer hit | fdr_synth < 0.05 | Li & Hart 2019; BH-corrected |
| Suppressor hit | fdr_supp < 0.05 | Same |
| High-confidence sensitizer | fdr_synth < 0.01 AND normZ < -3 | Conservative |
| Pseudocount default | 5 | Li & Hart 2019 |
| Min sgRNAs per gene for stable Z | 4-6 | Below this, Z varies between runs |
| Vehicle replicates needed | 3+ | For stable Z null distribution |
| Drug replicates needed | 3+ | For per-gene sumZ stability |
| Error / symptom | Cause | Solution |
|-----------------|-------|----------|
| No hits | Wrong control samples (Day-0 instead of vehicle) | Re-run with vehicle |
| Hits dominated by essentials | Essentials inflate null | Use --remove-genes-file with CEGv2 |
| Unstable hits across runs | Too few sgRNAs/gene | Use 6+ sgRNAs/gene library |
| Drug-target appears in suppressor | Real biology | Annotate separately |
| MAGeCK and drugZ disagree | Different statistical sensitivity | drugZ more sensitive; trust for chemogenomic |
| Inconsistent between doses | Real dose effect | Require consistency across doses |
tools
--- name: bio-phasing-imputation-foundations description: Frames the phasing/imputation pipeline before any tool runs: phasing and imputation are one Li-Stephens copying HMM (recombination is the transition, mutation the emission, the genetic map and Ne set the rates), imputation's honest output is a dosage with a self-estimated quality (INFO/R2/DR2) not a hard genotype, and the stages are ordered and each fails silently (QC, align build and strand to the panel, phase, impute per chromosome, fil
tools
Chooses the enrichment generation before any tool runs, mapping the input shape to a method class - a pre-selected gene list plus a background to over-representation analysis (ORA, hypergeometric), a ranked statistic for all genes to gene set enrichment (GSEA), a signed signaling topology to pathway-topology (SPIA) - then making the null explicit (competitive vs self-contained, gene vs subject sampling) and running a trustworthiness checklist (testable-gene universe, FDR, redundancy collapse, leading-edge check, version reporting). Covers why every clusterProfiler GSEA is the inter-gene-correlation-uncorrected competitive null, why the background not the gene list decides ORA significance, and why no method is universally best. Use when deciding ORA vs GSEA vs topology, which gene-set DB, whether a result is trustworthy, or which null a tool computes. For ORA see go-enrichment, GSEA see gsea, databases kegg-pathways/reactome-pathways/wikipathways; the ranking comes from differential-expression/de-results.
testing
End-to-end GWAS workflow from VCF to association results. Covers PLINK QC, population structure correction, and association testing for case-control or quantitative traits. Use when running genome-wide association studies.
development
Orchestrates the full path from differential expression results to redundancy-collapsed functional enrichment: choose ORA vs GSEA, convert gene IDs per method, run enrichGO/enrichKEGG/enrichPathway/enrichWP or gseGO/gseKEGG (clusterProfiler, ReactomePA, rWikiPathways), and visualize. Routes the ORA-vs-GSEA generation fork and the null/universe/reproducibility theory to pathway-analysis/enrichment-foundations. Use when a DESeq2/edgeR/limma result must become enriched GO terms, KEGG/Reactome/WikiPathways pathways, or a GSEA leading edge; when deciding whether a ranking exists for all genes (GSEA, named decreasing vector) or only a pre-selected list (ORA plus a defensible background universe); or when assembling DE-to-pathway end to end. The DE list and ranking statistic come from differential-expression/de-results; per-method nuance lives in the pathway-analysis skills.