genome-engineering/hdr-template-design/SKILL.md
Designs donor/repair templates for precise CRISPR knock-ins -- choosing the format (ssODN, long-ssDNA/Easi-CRISPR, dsDNA/plasmid, AAV6), sizing homology arms, placing the cut within ~10 bp of the edit, and adding a mandatory codon-checked blocking (PAM/seed) mutation so the edited allele is not re-cut. Frames the HDR-vs-NHEJ-vs-MMEJ pathway competition, the MMEJ (PITCh) and homology-independent (HITI/HMEJ) alternatives for post-mitotic cells, ssODN strand/asymmetry choice, phosphorothioate end-protection, and ranked HDR enhancers. Use when designing a donor for a point mutation, epitope/fluorophore tag, allele replacement, or knock-in, or when HDR efficiency is low. Guide design and base/prime editing are separate skills.
npx skillsauth add GPTomics/bioSkills bio-genome-engineering-hdr-template-designInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference examples tested with: BioPython 1.83+, primer3-py 2.0+.
Before using code patterns, verify installed versions match. If versions differ:
pip show <package> then help(module.function) to check signaturesIf code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
primer3-py designs PRIMERS (use primer3.bindings.design_primers(seq_args, global_args); the camelCase designPrimers is deprecated since 1.0.0), not homology arms -- arm extraction and codon-aware blocking are the skill's own BioPython code. Design arms and guide against the actual cell line's sequence, not GRCh38 -- a SNP in an arm reduces annealing and a SNP in the PAM/seed can mean the guide does not cut.
"Design a donor for my CRISPR knock-in" -> Decide the format by edit size and whether the cell cycles, size the arms, confirm a guide cuts within ~10 bp of the edit, and add a codon-checked blocking mutation so the corrected allele cannot be re-cut.
Bio.Seq; primer3.bindings.design_primers() for arm-amplification and junction-validation primersA Cas9 double-strand break is repaired by whichever pathway wins a kinetic race, and in most cells the winner is classical NHEJ (fast, all cell-cycle phases). MMEJ (microhomology, S/G2) and HDR (template-dependent, S/G2 only) are minority players, so unenhanced HDR knock-in is typically single-digit to low-double-digit percent -- that is normal, not a failure. The donor is not "the sequence to insert"; it is the toolkit for tilting a race NHEJ is structurally favored to win. The corollary, and the field's most expensive misread: a donor with perfect arms but no blocking mutation gets its successful edit erased -- the corrected allele still has an intact protospacer + PAM, so Cas9 re-cuts it and NHEJ scars it, and the indel reads out as "HDR failed" (indistinguishable from low HDR). So when someone reports "low HDR, lots of indels," the first question is not "how long are the arms?" -- it is "does the donor disrupt the PAM or seed?" A blocking mutation is mandatory and must be codon-checked; without it the readout is re-cutting, not HDR.
| Pathway | Cell cycle | Template | Signature | Relevance | |---------|-----------|----------|-----------|-----------| | c-NHEJ | all phases (dominant) | none | indels | the competitor; the engine HITI exploits | | MMEJ / alt-EJ (Pol theta) | S/G2 | 5-25 bp microhomology | microhomology-flanked deletions | the PITCh route | | HDR / HR | S/G2 only | sister chromatid or exogenous donor | precise, scarless | the classic knock-in route; minority | | SSA | S/G2 | repeats | deletion between repeats | nuisance |
End resection (cell-cycle-gated, licensed in S/G2; 53BP1-RIF1 protects ends/pro-NHEJ, BRCA1 antagonizes it/pro-HDR) decides the fork. Consequences: post-mitotic cells barely do HDR -> for neurons/muscle/in-vivo tissue, HITI/HMEJ (NHEJ-based) is the correct first choice, not a fallback. Timing RNP+donor delivery into S/G2 raises HDR (Lin 2014, up to ~38% in HEK293T) -- the donor is necessary but the cell-cycle state gates it.
| Format | Insert | Arms | Best for | Caveat | |--------|--------|------|----------|--------| | ssODN | <= ~50 bp edits | ~30-60 nt each (total ~120-200 nt) | point mutations, small tags, loxP | synthesis ceiling ~200 nt; strand choice contested | | long ssDNA (Easi-CRISPR) | ~0.2-2 kb | ~50-100 nt | cassettes, floxed/conditional alleles, zygote KI | less toxic & less random integration than dsDNA; harder to make | | dsDNA (PCR/linear) | ~0.1-1.5 kb | ~200-800 bp | medium cassettes | dsDNA is toxic (innate sensing) + random integration | | plasmid / HMEJ | up to several kb | ~500-2000 bp | large insertions, conditional alleles | backbone integration risk; slowest | | AAV6 | <= ~4.5 kb (ITR-to-ITR, arms included) | ~400 bp-1 kb | hard-to-transfect primary cells (HSPC, T, iPSC), in vivo | manufacturing cost; cargo cap is a hard wall |
Heuristics: point/small edit -> ssODN; 0.2-2 kb -> lssDNA over dsDNA (cleaner) for animal/zygote work; large cassette -> plasmid/HMEJ (lines) or AAV6+RNP (primary cells); post-mitotic -> HITI.
Richardson 2016 proposed an ssODN complementary to the non-target strand, asymmetric with the longer arm PAM-proximal (~91 nt) and the shorter PAM-distal (~36 nt). Subsequent systematic work could not reproduce this as universal: the optimal strand flips by locus and the asymmetric advantage often vanishes once both arms are >=30 nt. Treat it as a prior to test, not a law -- generate both strands and symmetric+asymmetric variants and test them. By contrast, phosphorothioate (PS) end-protection (2-3 terminal bases each end) is a near-universal cheap win (exonuclease resistance) -- encode these at opposite confidence levels.
| Situation | Route | |-----------|-------| | Point/small edit, cycling cells | ssODN + HDR (with blocking mutation) | | Medium/large cassette, cycling line | HDR (lssDNA/plasmid) or HMEJ | | Large cassette, primary cells (HSPC/T/iPSC) | AAV6 donor + RNP + HDR | | Clean zygote/animal KI, <=2 kb | lssDNA Easi-CRISPR + HDR | | Trivial donor construction wanted | PITCh (MMEJ) | | Non-dividing / post-mitotic / in vivo | HITI (or HMEJ) | | Edit far from any cut / single base | -> base-editing-design or prime-editing-design (donor-free) |
Most enhancers are marginal, cell-type-specific, and frequently non-reproducible; the published fold-changes are line-specific maxima. A blocking mutation and a cut near the edit matter more than any small molecule.
Goal: Assemble a donor that incorporates the edit AND survives re-cutting, with primers to amplify the arms and genotype the junction.
Approach: Extract arms flanking the cut, insert the edit, then add a blocking mutation -- disrupt the PAM synonymously if a wobble option exists, else introduce silent seed mutations -- verifying the change does not alter the encoded amino acid. Use primer3-py for arm-amplification/junction primers. (See examples/hdr_template_design.py for codon-aware blocking and a primer3 call.)
from Bio.Seq import Seq
def synonymous_pam_block(codon_table, pam_codon, alt_codon):
'''Return True only if a PAM-disrupting codon swap keeps the same amino acid (silent).'''
return codon_table.get(pam_codon) == codon_table.get(alt_codon) # never mutate the PAM without this check
Trigger: low edit, mostly indels, no blocking mutation. Mechanism: the corrected allele keeps an intact PAM -> Cas9 re-cuts -> NHEJ scar. Symptom: indels indistinguishable from no-HDR. Fix: add a codon-checked PAM/seed blocking mutation; the readout was re-cutting, not HDR.
Trigger: best-cutting guide is 25 bp from the edit. Mechanism: HDR incorporation falls with distance. Symptom: only the blocking mutation is incorporated (useless silent-only allele) or no edit. Fix: choose a guide cutting within ~10 bp; if none, switch to base/prime editing.
Trigger: blindly changing the NGG's second G to A. Mechanism: the PAM may be in a coding frame. Symptom: an unintended missense/nonsense change. Fix: verify the swap is synonymous; else use silent seed mutations.
Trigger: a plasmid/PCR donor in iPSC/primary/zygotes. Mechanism: dsDNA toxicity + random integration. Symptom: low viability, random integrants. Fix: use lssDNA (Easi-CRISPR) or AAV6.
Trigger: ssODN/plasmid for neurons/in-vivo tissue. Mechanism: HDR runs only in S/G2. Symptom: essentially no knock-in. Fix: use HITI (NHEJ-based) or HMEJ.
Trigger: GRCh38 arms for a passaged/cancer line. Mechanism: line-specific SNPs in the arm or PAM/seed. Symptom: poor annealing or no cut. Fix: design against the cell line's actual sequence; account for ploidy/zygosity.
| Parameter | Value | Source | |-----------|-------|--------| | ssODN total length | ~120-200 nt | synthesis ceiling | | ssODN arm | ~30-60 nt each | below ~30 HDR drops; above ~60 diminishing returns | | dsDNA/plasmid arm | ~200-800 bp (up to ~2 kb) | ~500-800 bp common sweet spot | | lssDNA insert | ~0.2-2 kb | Easi-CRISPR range | | AAV cargo | <= ~4.5 kb (arms included) | packaging limit (hard wall) | | PITCh microhomology | ~5-25 bp | MMEJ working range | | Edit-to-cut distance | <= ~10 bp | HDR incorporation falls with distance (Paquet 2016) | | Phosphorothioate | 2-3 terminal bases each end | exonuclease resistance | | Cold shock | 32 C, 24-48 h | G2/M accumulation (Guo 2018) | | Typical raw HDR | single-digit to ~20% (up to ~38-60% optimized) | minority pathway |
| Error / symptom | Cause | Solution | |-----------------|-------|----------| | Low HDR, mostly indels | no blocking mutation (re-cutting) | add codon-checked PAM/seed block | | Only the silent mutation incorporated | edit too far from cut | cut within ~10 bp or switch to base/prime editing | | Toxicity / random integration | dsDNA in sensitive cells | lssDNA or AAV6 | | No knock-in in neurons/in vivo | HDR donor in post-mitotic cells | HITI/HMEJ | | AAV donor will not package | arms + insert exceed ~4.5 kb | shorten arms/insert; budget against the cap |
tools
--- name: bio-phasing-imputation-foundations description: Frames the phasing/imputation pipeline before any tool runs: phasing and imputation are one Li-Stephens copying HMM (recombination is the transition, mutation the emission, the genetic map and Ne set the rates), imputation's honest output is a dosage with a self-estimated quality (INFO/R2/DR2) not a hard genotype, and the stages are ordered and each fails silently (QC, align build and strand to the panel, phase, impute per chromosome, fil
tools
Chooses the enrichment generation before any tool runs, mapping the input shape to a method class - a pre-selected gene list plus a background to over-representation analysis (ORA, hypergeometric), a ranked statistic for all genes to gene set enrichment (GSEA), a signed signaling topology to pathway-topology (SPIA) - then making the null explicit (competitive vs self-contained, gene vs subject sampling) and running a trustworthiness checklist (testable-gene universe, FDR, redundancy collapse, leading-edge check, version reporting). Covers why every clusterProfiler GSEA is the inter-gene-correlation-uncorrected competitive null, why the background not the gene list decides ORA significance, and why no method is universally best. Use when deciding ORA vs GSEA vs topology, which gene-set DB, whether a result is trustworthy, or which null a tool computes. For ORA see go-enrichment, GSEA see gsea, databases kegg-pathways/reactome-pathways/wikipathways; the ranking comes from differential-expression/de-results.
testing
End-to-end GWAS workflow from VCF to association results. Covers PLINK QC, population structure correction, and association testing for case-control or quantitative traits. Use when running genome-wide association studies.
development
Orchestrates the full path from differential expression results to redundancy-collapsed functional enrichment: choose ORA vs GSEA, convert gene IDs per method, run enrichGO/enrichKEGG/enrichPathway/enrichWP or gseGO/gseKEGG (clusterProfiler, ReactomePA, rWikiPathways), and visualize. Routes the ORA-vs-GSEA generation fork and the null/universe/reproducibility theory to pathway-analysis/enrichment-foundations. Use when a DESeq2/edgeR/limma result must become enriched GO terms, KEGG/Reactome/WikiPathways pathways, or a GSEA leading edge; when deciding whether a ranking exists for all genes (GSEA, named decreasing vector) or only a pre-selected list (ORA plus a defensible background universe); or when assembling DE-to-pathway end to end. The DE list and ranking statistic come from differential-expression/de-results; per-method nuance lives in the pathway-analysis skills.