Version Compatibility

Reference examples tested with: STAR 2.7.11+, samtools 1.19+

Before using code patterns, verify installed versions match. If versions differ:

CLI: <tool> --version then <tool> --help to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

STAR RNA-seq Alignment -- The Junction Database, the 255 MAPQ, and the Strand Column Decide the Result

"Align my RNA-seq reads" -> Map reads across exon-exon junctions to the genome, building or reusing a splice-aware index from a GTF -- because an RNA read spans introns the genome does not contain, so the junction database (and its sjdbOverhang), the two-pass choice, the 255-unique MAPQ, and the strand column STAR reports are what actually determine the downstream counts and variant calls.

CLI: STAR --runMode alignReads --genomeDir idx/ --readFilesIn R1.fq.gz R2.fq.gz --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate

Scope: RNA splice-aware mapping with STAR -- index generation, two-pass, GeneCounts, chimeric/fusion output, and the MAPQ/strandedness traps. Contig-naming reconciliation and the QC gate -> alignment-files. Low-memory RNA alignment -> hisat2-alignment. Counting reads over genes/transcripts -> rna-quantification. Single-cell droplet/plate counting (STARsolo) -> single-cell. DE on known transcripts without a BAM -> rna-quantification/alignment-free-quant. OUT OF SCOPE: DNA (bwa-alignment/bowtie2-alignment), long reads (long-read-sequencing/long-read-alignment).

The Single Most Important Modern Insight

The splice-junction database is part of the index/run config, and a wrong sjdbOverhang or per-sample two-pass silently changes the answer. STAR builds short artificial junction-flank sequences from the GTF so reads with a short overhang on one exon can still be placed; --sjdbOverhang sets the flank length and should equal max(readlength) - 1 (default 100 is fine near 100 bp reads but degrades junction sensitivity for very short reads). Two-pass (--twopassMode Basic) discovers novel junctions and re-aligns, raising novel-junction sensitivity -- but it is PER-SAMPLE: each sample is re-aligned against its own augmented index, so a junction found only in the deep/disease sample is rescued asymmetrically, a batch effect that confounds junction/splicing/sQTL comparisons. The cohort-correct recipe is to pool every sample's pass-1 SJ.out.tab, filter, and feed one common --sjdbFileChrStartEnd to a uniform second pass. Per-sample two-pass is fine for plain gene-level DE; it bites splicing analyses.
STAR's MAPQ is 255-for-unique and a multiplicity code, so it breaks GATK and a copied MAPQ filter deletes every multimapper. Unique reads get MAPQ 255 -- the SAM "mapping quality unavailable" value -- which GATK treats as missing and drops, the classic silently-empty RNA VCF; fix it at align time with --outSAMmapqUnique 60. Multimappers get 3 / 1 / 0 for 2 / 3-4 / >=5 loci (pure locus count, no score information), so a generic samtools view -q 10 or featureCounts -Q 10 copied from a DNA pipeline DELETES every multimapper while keeping every unique -- a directional "discard paralog/gene-family/rRNA/pseudogene reads" filter that biases against recently-duplicated gene families. A hard MAPQ filter is almost never what an RNA analysis wants.
STAR reports library strandedness for free in GeneCounts, and getting strand wrong roughly halves counts. --quantMode GeneCounts emits a 4-column ReadsPerGene.out.tab: gene, unstranded, forward, reverse. Summing columns 3 and 4 reveals the protocol -- roughly equal is unstranded (use col 2), col 3 dominant is forward, col 4 dominant is reverse (the common dUTP/TruSeq case, use col 4). Never assume the strand: feeding the wrong column (or the wrong -s to a counter) sends sense reads to "no feature" and roughly halves the counts, distorting DE.

How STAR Places a Splice (the mechanism in brief)

STAR seeds with Maximal Mappable Prefix search on an uncompressed suffix array: it finds the longest exact prefix, and when that prefix ends (at a junction, mismatch, or read end) it restarts from the next base -- so a junction-spanning read naturally decomposes into seeds on two exons that STAR then stitches across the intron with an N (skipped-region) CIGAR. The stitch is scored with splice priors that penalize non-canonical motifs (--scoreGapNoncan -8), long introns (--scoreGenomicLengthLog2scale -0.25), and reward annotated junctions (--sjdbScore 2), plus a hard anchor floor (--alignSJoverhangMin): a novel junction on a tiny anchor carries almost no information, so it must clear both the soft score prior and the hard overhang minimum. This is why annotation (the sjdb) and two-pass matter for short-overhang and novel junctions.

Tool Taxonomy

| Mode / output | Citation | Mechanism / role | When | |---------------|----------|------------------|------| | STAR --runMode alignReads | Dobin 2013 Bioinformatics 29:15 | MMP seed + stitch; spliced genomic BAM | the default RNA-to-genome alignment | | --twopassMode Basic | Dobin 2013 (STAR manual) | discover novel junctions, re-align | novel-isoform / variant / splicing work (per-sample); pool for cohorts | | --quantMode GeneCounts | STAR manual | per-gene counts + the strand-detection 3 columns | quick counts and strandedness inference | | --quantMode TranscriptomeSAM | STAR manual | transcriptome-coord BAM for RSEM / Salmon-aln | isoform-level EM quantification -> rna-quantification | | --chimSegmentMin -> Chimeric.out.junction | STAR manual | split-across-loci reads for fusion calling | STAR-Fusion / Arriba fusion detection | | STARsolo (--soloType) | STAR manual | cell-barcode + UMI single-cell counting | scRNA-seq (route OUT) -> single-cell | | HISAT2 | Kim 2019 Nat Biotechnol 37:907 | graph FM-index, ~1/4 the RAM | memory-constrained RNA (route OUT) -> hisat2-alignment | | Salmon / kallisto | Patro 2017 Nat Methods 14:417 | alignment-free transcript quantification | DE on known transcripts only (route OUT) -> rna-quantification/alignment-free-quant |

Decision Tree by Scenario

| Scenario | Recommended | Why | |----------|-------------|-----| | RNA-seq, ample RAM (>=32 GB), need a genomic BAM | STAR | fastest splice-aware aligner; native counts, fusions, 2-pass | | RNA variant calling | STAR 2-pass + --outSAMmapqUnique 60 then GATK SplitNCigarReads | 2-pass splices novel junctions; 60 avoids the 255 drop | | Novel-isoform / splicing / sQTL across a cohort | cohort 2-pass (pool SJ.out.tab -> common --sjdbFileChrStartEnd) | per-sample 2-pass is a junction batch effect | | Fusion detection | --chimSegmentMin 12 --chimOutType ... -> STAR-Fusion / Arriba | chimeric junctions are the fusion signal | | Single-cell RNA | STARsolo (route OUT) | barcode+UMI counting -> single-cell | | Memory-constrained (<32 GB) | route OUT to hisat2-alignment | STAR needs ~30 GB for human | | DE on known transcripts only | route OUT to rna-quantification/alignment-free-quant | Salmon/kallisto are faster and model multimapping better | | Small genome (bacterial/viral/plasmid) | STAR with reduced --genomeSAindexNbases | the default 14 silently builds a bad index / segfaults |

Default when uncertain: STAR with --outSAMtype BAM SortedByCoordinate, --quantMode GeneCounts (to also read strandedness), and --twopassMode Basic for per-sample novel-junction work; set --outSAMmapqUnique 60 for any STAR -> GATK path.

Generate the Genome Index

# sjdbOverhang = max(readlength) - 1 (149 for 2x150 reads). Default 100 degrades junctions for short reads.
STAR --runMode genomeGenerate --runThreadN 8 \
    --genomeDir star_index/ \
    --genomeFastaFiles genome.fa \
    --sjdbGTFfile annotation.gtf \
    --sjdbOverhang 149
# Small genome (e.g. 5 Mb): add --genomeSAindexNbases <= min(14, log2(GenomeLength)/2 - 1), or STAR segfaults.

Basic Alignment

STAR --runThreadN 8 \
    --genomeDir star_index/ \
    --readFilesIn reads_1.fq.gz reads_2.fq.gz \
    --readFilesCommand zcat \
    --outFileNamePrefix sample_ \
    --outSAMtype BAM SortedByCoordinate
samtools index sample_Aligned.sortedByCoord.out.bam
# STAR already coordinate-sorts -- a subsequent `samtools sort` is redundant. Single-end: one file in --readFilesIn.

Two-Pass + Gene Counts + Strandedness

STAR --runThreadN 8 --genomeDir star_index/ \
    --readFilesIn r1.fq.gz r2.fq.gz --readFilesCommand zcat \
    --outFileNamePrefix sample_ \
    --outSAMtype BAM SortedByCoordinate \
    --twopassMode Basic \
    --quantMode GeneCounts \
    --outSAMattrRGline ID:sample1 SM:sample1 PL:ILLUMINA LB:lib1 \
    --outSAMmapqUnique 60        # so a downstream GATK RNA-variant step does not drop the 255 uniques
# STAR read groups use --outSAMattrRGline (SPACE-separated tags), NOT bwa's tab-delimited -R '@RG\t...'.
# GATK requires read groups; comma-with-spaces separates groups for multiple --readFilesIn files.

# Detect strandedness from ReadsPerGene.out.tab (skip the 4 N_* summary rows, sum cols 3 vs 4):
awk 'NR>4 {f+=$3; r+=$4} END {printf "fwd(col3)=%d  rev(col4)=%d -> use col %s\n", f, r, (f>2*r?"3 fwd": r>2*f?"4 rev":"2 unstranded")}' sample_ReadsPerGene.out.tab

ENCODE Long-RNA-seq Parameter Set

STAR --runThreadN 8 --genomeDir star_index/ --readFilesIn r1.fq.gz r2.fq.gz --readFilesCommand zcat \
    --outFileNamePrefix sample_ --outSAMtype BAM SortedByCoordinate \
    --outFilterType BySJout \
    --outFilterMultimapNmax 20 \
    --outFilterMismatchNmax 999 --outFilterMismatchNoverReadLmax 0.04 \
    --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 \
    --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 \
    --sjdbScore 1 --outSAMattributes NH HI AS NM MD
# BySJout keeps only reads whose junctions passed the dataset-wide collapse; multimapNmax 20 retains real
# multi-locus genes; the 0.04 mismatch ratio scales with read length; intron caps at ~1 Mb cover human genes.

Fusion Detection

# Chimeric junctions for STAR-Fusion (params per the STAR-Fusion wiki).
STAR --runThreadN 8 --genomeDir star_index/ --readFilesIn r1.fq.gz r2.fq.gz --readFilesCommand zcat \
    --outFileNamePrefix sample_ --outSAMtype BAM SortedByCoordinate \
    --chimSegmentMin 12 --chimJunctionOverhangMin 12 --chimOutJunctionFormat 1 \
    --chimOutType Junctions
# Arriba instead reads chimeric alignments from the BAM: --chimSegmentMin 10 --chimOutType WithinBAM SoftClip.

Key Parameters (STAR defaults unless noted)

| Parameter | Default | Description | |-----------|---------|-------------| | --sjdbOverhang | 100 | junction-flank length at index build; set to readlength-1 | | --twopassMode | None | Basic for per-sample novel-junction discovery | | --outSAMmapqUnique | 255 | MAPQ for unique reads; set 60 for GATK | | --outFilterMultimapNmax | 10 | max loci to report (ENCODE uses 20 for RNA) | | --alignIntronMin / Max | 21 / 0 (auto) | gap < min is a deletion; cap Max ~1 Mb for human | | --alignSJoverhangMin / SJDBoverhangMin | 5 / 3 | novel / annotated junction anchor floor (ENCODE 8 / 1) | | --quantMode | -- | GeneCounts and/or TranscriptomeSAM | | --chimSegmentMin | 0 (off) | turn on chimeric/fusion detection | | --genomeSAindexNbases | 14 | reduce to min(14, log2(L)/2-1) for small genomes | | --genomeLoad / --limitBAMsortRAM | NoSharedMemory / -- | shared-memory reuse; explicit sort RAM |

Per-Method Failure Modes

STAR 255 MAPQ into GATK

Trigger: STAR BAM (uniques at 255) fed to GATK RNA variant calling. Mechanism: GATK reads 255 as "MAPQ unavailable" and drops the read. Symptom: a silently empty or near-empty RNA VCF. Fix: --outSAMmapqUnique 60 at align time (then SplitNCigarReads in the GATK RNA workflow).

MAPQ filter deletes every multimapper

Trigger: a -q 10 / -Q 10 MAPQ filter (copied from DNA) on STAR output. Mechanism: STAR multimappers are MAPQ <= 3, uniques 255, so the filter keeps only uniques. Symptom: systematic under-counting of paralog/gene-family/rRNA/pseudogene loci; DE driven by multimapper fraction. Fix: do not MAPQ-filter RNA for counting; handle multimappers in the counter (NH-aware) or via EM -> rna-quantification/featurecounts-counting.

Per-sample two-pass as a batch effect

Trigger: --twopassMode Basic per sample for a splicing/junction comparison. Mechanism: each sample is aligned against its own novel-junction-augmented index. Symptom: junction recovery correlates with depth/condition, confounding sQTL/differential splicing. Fix: pool pass-1 SJ.out.tab across the cohort, filter, feed one --sjdbFileChrStartEnd to a uniform second pass.

sjdbOverhang mismatched to read length

Trigger: index built with default 100 for 36-50 bp reads, or a mismatch between index build and align values. Mechanism: junction flanks far longer than the reads degrade short-overhang sensitivity; a mismatch errors at align time. Symptom: reduced novel-junction-spanning sensitivity, or "present sjdbOverhang not equal to genome generation step." Fix: rebuild with sjdbOverhang = max(readlength)-1.

Small genome with default SAindexNbases

Trigger: indexing a bacterial/viral/plasmid genome with --genomeSAindexNbases 14. Mechanism: the SA pre-index string is too long for the genome. Symptom: STAR silently builds a bad index or segfaults at align time. Fix: set --genomeSAindexNbases min(14, log2(GenomeLength)/2 - 1) (1 Mb -> 9, 100 kb -> 7).

Index built with a different STAR version

Trigger: an index built months ago loaded by a bumped STAR module. Mechanism: STAR refuses an index whose versionGenome differs. Symptom: "Genome version is INCOMPATIBLE with running STAR version," or a subtly different version that loads and behaves differently across a cohort. Fix: rebuild the index with the exact aligning STAR version; pin the version for the whole cohort.

Quantitative Thresholds

| Threshold | Source | Rationale | |-----------|--------|-----------| | sjdbOverhang = max(readlength) - 1 | STAR manual | sets the junction-flank length for short-overhang reads | | --outSAMmapqUnique 60 for GATK | STAR manual / GATK RNA best practice | 255 is "unavailable" and is dropped by GATK | | --outFilterMultimapNmax 20, --outFilterMismatchNoverReadLmax 0.04 | ENCODE long-RNA-seq pipeline | retains real multi-locus genes; mismatch budget scales with read length | | --genomeSAindexNbases <= min(14, log2(L)/2 - 1) | STAR manual | the default 14 corrupts/segfaults small-genome indexes | | STAR human-genome index RAM ~30 GB | STAR docs (approximate) | the reason to route memory-constrained jobs to HISAT2 | | --chimSegmentMin 12 (STAR-Fusion) | STAR-Fusion wiki | minimum chimeric-segment length for fusion calling |

Common Errors

| Error / symptom | Cause | Solution | |-----------------|-------|----------| | Empty / tiny RNA VCF after GATK | STAR 255 MAPQ dropped as "unavailable" | --outSAMmapqUnique 60 (and SplitNCigarReads) | | GATK "no read group" on a STAR BAM | STAR omits @RG unless asked | add --outSAMattrRGline ID:.. SM:.. PL:ILLUMINA LB:.. (space-separated, not bwa's -R) | | htseq-count silently miscounts STAR output | htseq wants name-sorted (or -r pos) input; STAR coordinate-sorts | emit --outSAMtype BAM Unsorted for htseq, or samtools sort -n; featureCounts accepts either | | Counts ~halved, antisense artifacts | wrong strandedness column / -s | infer from GeneCounts cols 3 vs 4 (or RSeQC); TruSeq/dUTP = reverse | | Paralog/gene-family genes under-counted | a -q 10 MAPQ filter deleted multimappers | drop the MAPQ filter; handle NH>1 in the counter -> rna-quantification | | "not enough memory for BAM sorting" | --limitBAMsortRAM too small | set it explicitly (e.g. 10000000000) | | Segfault / bad index on a small genome | --genomeSAindexNbases 14 | reduce per min(14, log2(L)/2 - 1) | | "Genome version INCOMPATIBLE" | index built with another STAR version | rebuild with the running version; pin it | | 0 counts despite high mapping rate | genome/GTF contig-naming mismatch | reconcile chr1 vs 1, chrM vs MT (same source/release) -> alignment-files | | STAR reads .gz FASTQ as garbage / fails | STAR does not auto-detect gzip | add --readFilesCommand zcat (bwa/bowtie2/HISAT2 auto-detect gzip) |

References

Dobin A, Davis CA, Schlesinger F, et al. 2013. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29:15-21.
Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. 2019. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol 37:907-915.
Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C. 2017. Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods 14:417-419.
Burset M, Seledtsov IA, Solovyev VV. 2000. Analysis of canonical and non-canonical splice sites in mammalian genomes. Nucleic Acids Res 28:4364-4375.

Related Skills

hisat2-alignment - Low-memory splice-aware alternative to STAR
bwa-alignment - DNA short-read mapping (when reads do not cross junctions)
read-qc/rnaseq-qc - RNA destination metrics: rRNA, gene-body coverage, strandedness
read-qc/fastp-workflow - Trim adapters/poly-A before alignment
alignment-files/bam-statistics - flagstat/idxstats QC gate; what a high mapping rate hides; contig naming
rna-quantification/featurecounts-counting - Count aligned reads over genes (NH-aware multimapper handling)
rna-quantification/alignment-free-quant - Salmon/kallisto when only known-transcript DE is needed
differential-expression/deseq2-basics - Downstream DE from the count matrix
single-cell/data-io - STARsolo single-cell counts into a single-cell workflow

Version Compatibility

Reference examples tested with: STAR 2.7.11+, samtools 1.19+

Before using code patterns, verify installed versions match. If versions differ:

CLI: <tool> --version then <tool> --help to confirm flags

If code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.

STAR RNA-seq Alignment -- The Junction Database, the 255 MAPQ, and the Strand Column Decide the Result

CLI: STAR --runMode alignReads --genomeDir idx/ --readFilesIn R1.fq.gz R2.fq.gz --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate

The Single Most Important Modern Insight

The splice-junction database is part of the index/run config, and a wrong sjdbOverhang or per-sample two-pass silently changes the answer. STAR builds short artificial junction-flank sequences from the GTF so reads with a short overhang on one exon can still be placed; --sjdbOverhang sets the flank length and should equal max(readlength) - 1 (default 100 is fine near 100 bp reads but degrades junction sensitivity for very short reads). Two-pass (--twopassMode Basic) discovers novel junctions and re-aligns, raising novel-junction sensitivity -- but it is PER-SAMPLE: each sample is re-aligned against its own augmented index, so a junction found only in the deep/disease sample is rescued asymmetrically, a batch effect that confounds junction/splicing/sQTL comparisons. The cohort-correct recipe is to pool every sample's pass-1 SJ.out.tab, filter, and feed one common --sjdbFileChrStartEnd to a uniform second pass. Per-sample two-pass is fine for plain gene-level DE; it bites splicing analyses.
STAR's MAPQ is 255-for-unique and a multiplicity code, so it breaks GATK and a copied MAPQ filter deletes every multimapper. Unique reads get MAPQ 255 -- the SAM "mapping quality unavailable" value -- which GATK treats as missing and drops, the classic silently-empty RNA VCF; fix it at align time with --outSAMmapqUnique 60. Multimappers get 3 / 1 / 0 for 2 / 3-4 / >=5 loci (pure locus count, no score information), so a generic samtools view -q 10 or featureCounts -Q 10 copied from a DNA pipeline DELETES every multimapper while keeping every unique -- a directional "discard paralog/gene-family/rRNA/pseudogene reads" filter that biases against recently-duplicated gene families. A hard MAPQ filter is almost never what an RNA analysis wants.
STAR reports library strandedness for free in GeneCounts, and getting strand wrong roughly halves counts. --quantMode GeneCounts emits a 4-column ReadsPerGene.out.tab: gene, unstranded, forward, reverse. Summing columns 3 and 4 reveals the protocol -- roughly equal is unstranded (use col 2), col 3 dominant is forward, col 4 dominant is reverse (the common dUTP/TruSeq case, use col 4). Never assume the strand: feeding the wrong column (or the wrong -s to a counter) sends sense reads to "no feature" and roughly halves the counts, distorting DE.

How STAR Places a Splice (the mechanism in brief)

Tool Taxonomy

Decision Tree by Scenario

Generate the Genome Index

# sjdbOverhang = max(readlength) - 1 (149 for 2x150 reads). Default 100 degrades junctions for short reads.
STAR --runMode genomeGenerate --runThreadN 8 \
    --genomeDir star_index/ \
    --genomeFastaFiles genome.fa \
    --sjdbGTFfile annotation.gtf \
    --sjdbOverhang 149
# Small genome (e.g. 5 Mb): add --genomeSAindexNbases <= min(14, log2(GenomeLength)/2 - 1), or STAR segfaults.

Basic Alignment

STAR --runThreadN 8 \
    --genomeDir star_index/ \
    --readFilesIn reads_1.fq.gz reads_2.fq.gz \
    --readFilesCommand zcat \
    --outFileNamePrefix sample_ \
    --outSAMtype BAM SortedByCoordinate
samtools index sample_Aligned.sortedByCoord.out.bam
# STAR already coordinate-sorts -- a subsequent `samtools sort` is redundant. Single-end: one file in --readFilesIn.

Two-Pass + Gene Counts + Strandedness

STAR --runThreadN 8 --genomeDir star_index/ \
    --readFilesIn r1.fq.gz r2.fq.gz --readFilesCommand zcat \
    --outFileNamePrefix sample_ \
    --outSAMtype BAM SortedByCoordinate \
    --twopassMode Basic \
    --quantMode GeneCounts \
    --outSAMattrRGline ID:sample1 SM:sample1 PL:ILLUMINA LB:lib1 \
    --outSAMmapqUnique 60        # so a downstream GATK RNA-variant step does not drop the 255 uniques
# STAR read groups use --outSAMattrRGline (SPACE-separated tags), NOT bwa's tab-delimited -R '@RG\t...'.
# GATK requires read groups; comma-with-spaces separates groups for multiple --readFilesIn files.

# Detect strandedness from ReadsPerGene.out.tab (skip the 4 N_* summary rows, sum cols 3 vs 4):
awk 'NR>4 {f+=$3; r+=$4} END {printf "fwd(col3)=%d  rev(col4)=%d -> use col %s\n", f, r, (f>2*r?"3 fwd": r>2*f?"4 rev":"2 unstranded")}' sample_ReadsPerGene.out.tab

ENCODE Long-RNA-seq Parameter Set

STAR --runThreadN 8 --genomeDir star_index/ --readFilesIn r1.fq.gz r2.fq.gz --readFilesCommand zcat \
    --outFileNamePrefix sample_ --outSAMtype BAM SortedByCoordinate \
    --outFilterType BySJout \
    --outFilterMultimapNmax 20 \
    --outFilterMismatchNmax 999 --outFilterMismatchNoverReadLmax 0.04 \
    --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 \
    --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 \
    --sjdbScore 1 --outSAMattributes NH HI AS NM MD
# BySJout keeps only reads whose junctions passed the dataset-wide collapse; multimapNmax 20 retains real
# multi-locus genes; the 0.04 mismatch ratio scales with read length; intron caps at ~1 Mb cover human genes.

Fusion Detection

# Chimeric junctions for STAR-Fusion (params per the STAR-Fusion wiki).
STAR --runThreadN 8 --genomeDir star_index/ --readFilesIn r1.fq.gz r2.fq.gz --readFilesCommand zcat \
    --outFileNamePrefix sample_ --outSAMtype BAM SortedByCoordinate \
    --chimSegmentMin 12 --chimJunctionOverhangMin 12 --chimOutJunctionFormat 1 \
    --chimOutType Junctions
# Arriba instead reads chimeric alignments from the BAM: --chimSegmentMin 10 --chimOutType WithinBAM SoftClip.

Key Parameters (STAR defaults unless noted)

Per-Method Failure Modes

STAR 255 MAPQ into GATK

MAPQ filter deletes every multimapper

Per-sample two-pass as a batch effect

sjdbOverhang mismatched to read length

Small genome with default SAindexNbases

Index built with a different STAR version

Quantitative Thresholds

Common Errors

References

Dobin A, Davis CA, Schlesinger F, et al. 2013. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29:15-21.
Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. 2019. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol 37:907-915.
Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C. 2017. Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods 14:417-419.
Burset M, Seledtsov IA, Solovyev VV. 2000. Analysis of canonical and non-canonical splice sites in mammalian genomes. Nucleic Acids Res 28:4364-4375.

Related Skills

hisat2-alignment - Low-memory splice-aware alternative to STAR
bwa-alignment - DNA short-read mapping (when reads do not cross junctions)
read-qc/rnaseq-qc - RNA destination metrics: rRNA, gene-body coverage, strandedness
read-qc/fastp-workflow - Trim adapters/poly-A before alignment
alignment-files/bam-statistics - flagstat/idxstats QC gate; what a high mapping rate hides; contig naming
rna-quantification/featurecounts-counting - Count aligned reads over genes (NH-aware multimapper handling)
rna-quantification/alignment-free-quant - Salmon/kallisto when only known-transcript DE is needed
differential-expression/deseq2-basics - Downstream DE from the count matrix
single-cell/data-io - STARsolo single-cell counts into a single-cell workflow

Adoption

GPTomics/bio-read-alignment-star-alignment

$ install --global

Security Scan Results

SKILL.md

Version Compatibility

STAR RNA-seq Alignment -- The Junction Database, the 255 MAPQ, and the Strand Column Decide the Result

The Single Most Important Modern Insight

How STAR Places a Splice (the mechanism in brief)

Tool Taxonomy

Decision Tree by Scenario

Generate the Genome Index

Basic Alignment

Two-Pass + Gene Counts + Strandedness

ENCODE Long-RNA-seq Parameter Set

Fusion Detection

Key Parameters (STAR defaults unless noted)

Per-Method Failure Modes

STAR 255 MAPQ into GATK

MAPQ filter deletes every multimapper

Per-sample two-pass as a batch effect

sjdbOverhang mismatched to read length

Small genome with default SAindexNbases

Index built with a different STAR version

Quantitative Thresholds

Common Errors

References

Related Skills

Related Skills

GPTomics/bio-workflows-clip-pipeline

GPTomics/bio-comparative-genomics-whole-genome-duplication

GPTomics/bio-comparative-genomics-whole-genome-alignment

GPTomics/bio-comparative-genomics-synteny-analysis

GPTomics/bio-read-alignment-star-alignment

$ install --global

Security Scan Results

SKILL.md

Version Compatibility

STAR RNA-seq Alignment -- The Junction Database, the 255 MAPQ, and the Strand Column Decide the Result

The Single Most Important Modern Insight

How STAR Places a Splice (the mechanism in brief)

Tool Taxonomy

Decision Tree by Scenario

Generate the Genome Index

Basic Alignment

Two-Pass + Gene Counts + Strandedness

ENCODE Long-RNA-seq Parameter Set

Fusion Detection

Key Parameters (STAR defaults unless noted)

Per-Method Failure Modes

STAR 255 MAPQ into GATK

MAPQ filter deletes every multimapper

Per-sample two-pass as a batch effect

sjdbOverhang mismatched to read length

Small genome with default SAindexNbases

Index built with a different STAR version

Quantitative Thresholds

Common Errors

References

Related Skills

Related Skills

GPTomics/bio-workflows-clip-pipeline

GPTomics/bio-comparative-genomics-whole-genome-duplication

GPTomics/bio-comparative-genomics-whole-genome-alignment

GPTomics/bio-comparative-genomics-synteny-analysis