read-alignment/hisat2-alignment/SKILL.md
Align RNA-seq reads with HISAT2, a memory-efficient splice-aware aligner. Use when STAR's memory requirements are too high or for general RNA-seq alignment.
npx skillsauth add GPTomics/bioSkills bio-read-alignment-hisat2-alignmentInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference examples tested with: samtools 1.19+
Before using code patterns, verify installed versions match. If versions differ:
<tool> --version then <tool> --help to confirm flagsIf code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
"Align RNA-seq reads with HISAT2" -> Map RNA-seq reads to a reference genome with splice-aware alignment. Suitable for gene expression quantification workflows.
hisat2 -x index -1 R1.fq -2 R2.fq | samtools sort -o aligned.bam# Basic index (no annotation)
hisat2-build -p 8 reference.fa hisat2_index
# Index with splice sites and exons (recommended)
hisat2_extract_splice_sites.py annotation.gtf > splice_sites.txt
hisat2_extract_exons.py annotation.gtf > exons.txt
hisat2-build -p 8 \
--ss splice_sites.txt \
--exon exons.txt \
reference.fa hisat2_index
# Paired-end reads
hisat2 -p 8 -x hisat2_index \
-1 reads_1.fq.gz -2 reads_2.fq.gz \
-S aligned.sam
# Single-end reads
hisat2 -p 8 -x hisat2_index \
-U reads.fq.gz \
-S aligned.sam
# Pipe to samtools
hisat2 -p 8 -x hisat2_index \
-1 r1.fq.gz -2 r2.fq.gz | \
samtools sort -@ 4 -o aligned.sorted.bam -
samtools index aligned.sorted.bam
# Forward stranded (e.g., Ligation)
hisat2 -p 8 -x hisat2_index \
--rna-strandness FR \
-1 r1.fq.gz -2 r2.fq.gz -S aligned.sam
# Reverse stranded (e.g., dUTP, TruSeq - most common)
hisat2 -p 8 -x hisat2_index \
--rna-strandness RF \
-1 r1.fq.gz -2 r2.fq.gz -S aligned.sam
# Single-end stranded
hisat2 -p 8 -x hisat2_index \
--rna-strandness F \ # or R for reverse
-U reads.fq.gz -S aligned.sam
# Output novel splice junctions
hisat2 -p 8 -x hisat2_index \
--novel-splicesite-outfile novel_splices.txt \
-1 r1.fq.gz -2 r2.fq.gz -S aligned.sam
# Use known + novel junctions for subsequent alignments
hisat2 -p 8 -x hisat2_index \
--novel-splicesite-infile novel_splices.txt \
-1 r1.fq.gz -2 r2.fq.gz -S aligned.sam
Goal: Improve splice junction sensitivity by discovering novel junctions across all samples in a first pass, then realigning with the combined junction set.
Approach: Run HISAT2 on each sample to extract novel splice sites, merge and deduplicate junctions across samples, then realign all samples using the combined junction catalog.
# Pass 1: Discover junctions from all samples
for r1 in *_R1.fq.gz; do
r2=${r1/_R1/_R2}
base=$(basename $r1 _R1.fq.gz)
hisat2 -p 8 -x hisat2_index \
--novel-splicesite-outfile ${base}_splices.txt \
-1 $r1 -2 $r2 -S /dev/null
done
# Combine and filter junctions
cat *_splices.txt | sort -u > combined_splices.txt
# Pass 2: Realign with all junctions
for r1 in *_R1.fq.gz; do
r2=${r1/_R1/_R2}
base=$(basename $r1 _R1.fq.gz)
hisat2 -p 8 -x hisat2_index \
--novel-splicesite-infile combined_splices.txt \
-1 $r1 -2 $r2 | \
samtools sort -@ 4 -o ${base}.sorted.bam -
done
hisat2 -p 8 -x hisat2_index \
--rg-id sample1 \
--rg SM:sample1 \
--rg PL:ILLUMINA \
--rg LB:lib1 \
-1 r1.fq.gz -2 r2.fq.gz -S aligned.sam
# Output name-sorted BAM for htseq-count
hisat2 -p 8 -x hisat2_index -1 r1.fq.gz -2 r2.fq.gz | \
samtools sort -n -@ 4 -o aligned.namesorted.bam -
# Or coordinate-sorted for featureCounts
hisat2 -p 8 -x hisat2_index -1 r1.fq.gz -2 r2.fq.gz | \
samtools sort -@ 4 -o aligned.sorted.bam -
| Parameter | Default | Description | |-----------|---------|-------------| | -p | 1 | Number of threads | | -x | - | Index basename | | --rna-strandness | unstranded | FR/RF/F/R | | --dta | off | Downstream transcriptome assembly | | --dta-cufflinks | off | For Cufflinks | | --min-intronlen | 20 | Minimum intron length | | --max-intronlen | 500000 | Maximum intron length | | -k | 5 | Max alignments to report |
# Use --dta for StringTie
hisat2 -p 8 -x hisat2_index \
--dta \
-1 r1.fq.gz -2 r2.fq.gz | \
samtools sort -@ 4 -o aligned.sorted.bam -
# HISAT2 prints summary to stderr
hisat2 -p 8 -x hisat2_index -1 r1.fq.gz -2 r2.fq.gz -S aligned.sam 2> summary.txt
Example:
50000000 reads; of these:
50000000 (100.00%) were paired; of these:
2500000 (5.00%) aligned concordantly 0 times
45000000 (90.00%) aligned concordantly exactly 1 time
2500000 (5.00%) aligned concordantly >1 times
95.00% overall alignment rate
| Aligner | Human Genome Memory | |---------|-------------------| | STAR | ~30GB | | HISAT2 | ~8GB |
development
Find restriction enzyme cut sites in DNA sequences using Biopython Bio.Restriction. Search with single enzymes, batches of enzymes, or commercially available enzyme sets. Returns cut positions for linear or circular DNA. Use when finding restriction enzyme cut sites in sequences.
development
Create restriction maps showing enzyme cut positions on DNA sequences using Biopython Bio.Restriction. Visualize cut sites, calculate distances between sites, and generate text or graphical maps. Use when creating or analyzing restriction maps.
development
Analyze restriction digest fragments using Biopython Bio.Restriction. Predict fragment sizes, get fragment sequences, simulate gel electrophoresis patterns, and perform double digests. Use when analyzing restriction digest fragment patterns.
development
Select restriction enzymes by criteria using Biopython Bio.Restriction. Find enzymes that cut once, don't cut, produce specific overhangs, are commercially available, or have compatible ends for cloning. Use when selecting restriction enzymes for cloning or analysis.