read-qc/quality-reports/SKILL.md
Generate and interpret quality reports from FASTQ files using FastQC and MultiQC. Assess per-base quality, adapter content, GC bias, duplication levels, and overrepresented sequences. Use when performing initial QC on raw sequencing data or validating preprocessing results.
npx skillsauth add GPTomics/bioSkills bio-read-qc-quality-reportsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference examples tested with: pandas 2.2+
Before using code patterns, verify installed versions match. If versions differ:
pip show <package> then help(module.function) to check signatures<tool> --version then <tool> --help to confirm flagsIf code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
Generate quality reports for FASTQ files using FastQC and aggregate multiple reports with MultiQC.
"Run quality control on FASTQ files" -> Generate per-base quality, adapter content, and duplication plots, then aggregate across samples.
fastqc *.fastq.gz then multiqc .# Single file
fastqc sample.fastq.gz
# Multiple files
fastqc *.fastq.gz
# Specify output directory
fastqc -o qc_reports/ sample_R1.fastq.gz sample_R2.fastq.gz
# Set threads
fastqc -t 4 *.fastq.gz
FastQC produces two files per input:
sample_fastqc.html - Interactive HTML reportsample_fastqc.zip - Data files and images| Module | What It Shows | Warning Signs | |--------|---------------|---------------| | Per base sequence quality | Quality scores across read | Drop below Q20 at 3' end | | Per sequence quality | Quality score distribution | Bimodal distribution | | Per base sequence content | Nucleotide composition | Imbalance at start (normal) | | Per sequence GC content | GC distribution | Secondary peak (contamination) | | Per base N content | Unknown bases | High N content | | Sequence length distribution | Read lengths | Unexpected variation | | Sequence duplication | Duplicate reads | High duplication (PCR) | | Overrepresented sequences | Common sequences | Adapter contamination | | Adapter content | Adapter sequences | Visible adapter curves |
# Unzip to access raw data
unzip sample_fastqc.zip
# View summary
cat sample_fastqc/summary.txt
# Get per-base quality
cat sample_fastqc/fastqc_data.txt | grep -A 50 ">>Per base sequence quality"
# Aggregate all FastQC reports in current directory
multiqc .
# Specify input and output
multiqc qc_reports/ -o multiqc_output/
# Custom report name
multiqc . -n my_project_qc
# Force overwrite
multiqc . -f
# Flat directory (no sample subdirs)
multiqc --flat .
# Export data as TSV
multiqc . --export
# Only specific modules
multiqc . -m fastqc
# Exclude patterns
multiqc . --ignore '*_trimmed*'
# Include patterns
multiqc . --ignore-samples '*negative*'
multiqc_report.html - Interactive HTML reportmultiqc_data/ - Directory with data tables
multiqc_fastqc.txt - FastQC metricsmultiqc_general_stats.txt - Summary statisticsmultiqc_sources.txt - Source files usedimport pandas as pd
general_stats = pd.read_csv('multiqc_data/multiqc_general_stats.txt', sep='\t')
print(general_stats.columns)
fastqc_data = pd.read_csv('multiqc_data/multiqc_fastqc.txt', sep='\t')
# All FASTQ files in parallel
fastqc -t 8 -o qc_reports/ raw_data/*.fastq.gz
# Then aggregate
multiqc qc_reports/ -o multiqc_output/
# Create separate directories
mkdir -p qc_reports/raw qc_reports/trimmed
# QC raw reads
fastqc -o qc_reports/raw/ raw_data/*.fastq.gz
# After trimming (using fastp, cutadapt, etc.)
fastqc -o qc_reports/trimmed/ trimmed_data/*.fastq.gz
# Compare with MultiQC
multiqc qc_reports/ -o qc_comparison/
| Phred Score | Error Rate | Interpretation | |-------------|------------|----------------| | Q40 | 0.0001 | Excellent | | Q30 | 0.001 | Good (Illumina target) | | Q20 | 0.01 | Acceptable | | Q10 | 0.1 | Poor |
| Issue | Likely Cause | Action | |-------|--------------|--------| | Low quality at 3' end | Normal degradation | Trim 3' end | | Adapter contamination | Short inserts | Trim adapters | | GC bias | Library prep | Consider correction | | High duplication | Low complexity, PCR | Mark/remove duplicates | | Overrepresented seqs | Adapters, primers | Check sequences |
Create ~/.fastqc/Configuration/adapter_list.txt:
Custom_Adapter_Name ACGTACGTACGT
Create ~/.fastqc/Configuration/limits.txt to customize thresholds:
# Warn if mean quality below 25
quality_sequence warn 25
quality_sequence error 20
testing
Analyze multi-modal single-cell data (CITE-seq, Multiome, spatial). Use when working with data that measures multiple modalities per cell like RNA + protein or RNA + ATAC. Use when analyzing CITE-seq, Multiome, or other multi-modal single-cell data.
data-ai
Analyze metabolite-mediated cell-cell communication using MeboCost for metabolic signaling inference between cell types. Predict metabolite secretion and sensing patterns from scRNA-seq data. Use when studying metabolic crosstalk between cell populations or metabolite-receptor interactions.
development
Find marker genes and annotate cell types in single-cell RNA-seq using Seurat (R) and Scanpy (Python). Use for differential expression between clusters, identifying cluster-specific markers, scoring gene sets, and assigning cell type labels. Use when finding marker genes and annotating clusters.
development
Reconstruct cell lineage trees from CRISPR barcode tracing or mitochondrial mutations. Use when studying clonal dynamics, cell fate decisions, or developmental trajectories.