
# Phylogenetics Skill ## When to Use Use for phylogenetic tree reconstruction from sequence alignments. ## Standard Workflow 1. Align sequences with MUSCLE, MAFFT, or ClustalW 2. Build trees with FastTree, RAxML, or IQ-TREE 3. Visualize and annotate trees ## Key Decisions - Choose appropriate substitution model - Use bootstrapping for branch support - FastTree for speed, RAxML/IQ-TREE for accuracy
Critically review AI-agent-conducted scientific analyses for correctness, rigor, and completeness. Use this skill whenever an analysis session has completed and needs validation, when a user asks to "review," "validate," "check," or "audit" a computational analysis, or when an agent pipeline produces scientific results that require quality control before reporting. Also trigger when the user references an execution trace, notebook, or conversation history from a prior analysis session. This skill should run as the final step of any autonomous scientific analysis pipeline.
# Trimmomatic - Read Quality Trimming ## When to Use Use Trimmomatic to trim adapter sequences and low-quality bases from Illumina sequencing reads. ## Standard Workflow 1. Install: `conda install -c bioconda trimmomatic` 2. Run: `trimmomatic PE <input_R1.fastq.gz> <input_R2.fastq.gz> <output_R1_paired.fastq.gz> <output_R1_unpaired.fastq.gz> <output_R2_paired.fastq.gz> <output_R2_unpaired.fastq.gz> ILLUMINACLIP:<adapters.fa>:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36` ## Key Pa
# Bracken - Bayesian Reestimation of Abundance with KrakEN ## When to Use Use Bracken to estimate species/genus/phylum-level abundances from Kraken2 classification results. ## Standard Workflow 1. Install: `conda install -c bioconda bracken` 2. Run after Kraken2: `bracken -d <db_path> -i <kraken_report.txt> -o <bracken_output.txt> -r <read_length> -l <level> -t <threshold>` 3. Output: abundance table with fraction_total_reads per taxon ## Key Parameters - `-d`: Kraken2 database path (must con
# BWA-MEM2 Skill ## When to Use Use for aligning paired-end Illumina reads to a reference genome. BWA-MEM2 is the successor to BWA-MEM with improved performance. ## Standard Workflow 1. Index reference: `bwa index reference.fasta` 2. Align reads: `bwa mem -t 4 reference.fasta R1.fastq.gz R2.fastq.gz | samtools sort -o aligned.bam` 3. Index BAM: `samtools index aligned.bam` ## Key Decisions - Use `-t` flag to set number of threads - Pipe directly to samtools sort for efficiency - Add read grou
# Comparative Genomics Skill ## When to Use Use for comparing gene content, synteny, and functional modules across genomes. ## Standard Workflow 1. Predict genes with Prodigal 2. Identify orthologous clusters with OrthoFinder or similar 3. Annotate clusters with functional databases (COG, KEGG, etc.) 4. Filter for conserved gene clusters present across all genomes 5. Identify co-evolving gene modules ## Key Decisions - Use KEGG/COG annotations for functional characterization - Filter for clus
# Kallisto - RNA-Seq Pseudoalignment Quantification ## When to Use Use Kallisto as an alternative to Salmon for transcript-level quantification from RNA-Seq reads using pseudoalignment. ## Standard Workflow ### 1. Build Index ```bash kallisto index -i kallisto_index transcriptome.fa ``` ### 2. Quantify (paired-end) ```bash kallisto quant -i kallisto_index -o kallisto_output reads_1.fq.gz reads_2.fq.gz ``` ### 3. Output Results in `kallisto_output/abundance.tsv`: - `target_id`: transcript ID
# Metagenomics Analysis Pipeline ## When to Use Use for end-to-end metagenomic community profiling from raw reads to taxonomic abundance tables. ## Standard Workflow 1. Quality control: Trim adapters and low-quality bases with Trimmomatic 2. Taxonomic classification: Classify reads with Kraken2 against reference database 3. Abundance estimation: Estimate taxon abundances with Bracken 4. Parse results: Generate summary tables (CSV) with taxonomic assignments and relative abundances ## Pipeline
# OrthoFinder Skill ## When to Use Use OrthoFinder for identifying orthologous gene groups across multiple genomes, inferring gene trees, and identifying orthogroups (clusters of orthologous genes). ## Standard Workflow 1. Prepare protein FASTA files (one per genome) from gene predictions 2. Run OrthoFinder: `orthofinder -f <protein_fasta_dir>` 3. Parse results from Orthogroups/Orthogroups.tsv 4. Filter orthogroups present in all species (core orthogroups) ## Key Decisions - Use protein seque
# Prodigal Skill ## When to Use Use Prodigal for ab initio gene prediction in prokaryotic genomes. ## Standard Workflow 1. Run Prodigal on each genome FASTA: `prodigal -i genome.fna -a proteins.faa -o genes.gff -f gff` 2. Extract protein sequences from -a output 3. Extract CDS coordinates from GFF output ## Key Decisions - Use `-p single` for single genome mode (default) - Use `-p meta` for metagenomics mode - Output both protein (-a) and nucleotide (-d) sequences
# Salmon - RNA-Seq Transcript Quantification ## When to Use Use Salmon for fast, accurate transcript-level quantification from RNA-Seq reads. Salmon uses quasi-mapping to align reads directly to a transcriptome reference without requiring genome alignment. ## Standard Workflow ### 1. Build Index ```bash salmon index -t transcriptome.fa -i salmon_index ``` ### 2. Quantify (paired-end) ```bash salmon quant -i salmon_index -l A \ -1 reads_1.fq.gz -2 reads_2.fq.gz \ -o salmon_output \ --va
# Samtools Skill ## When to Use Use for BAM/SAM file manipulation, sorting, indexing, and basic variant calling support. ## Standard Workflow 1. Convert SAM to BAM: `samtools view -bS input.sam -o output.bam` 2. Sort BAM: `samtools sort -o sorted.bam input.bam` 3. Index BAM: `samtools index sorted.bam` 4. View stats: `samtools flagstat sorted.bam` 5. Index FASTA: `samtools faidx reference.fasta` ## Key Decisions - Always sort and index BAMs before variant calling - Use samtools flagstat to ch
# SnpEff Annotation Skill ## When to Use Use when annotating variants with functional impact predictions (gene, effect, impact). ## Standard Workflow 1. Build SnpEff database from reference genome + GFF: configure snpEff.config, then `snpEff build -gff3 -v genome_name` 2. Annotate VCF: `snpEff ann genome_name variants.vcf > annotated.vcf` 3. Parse annotations from ANN field in VCF ## Key Decisions - For de novo assembled genomes, build a custom SnpEff database using Prokka annotations - Impac
# SPAdes Assembly Skill ## When to Use Use for de novo genome assembly when no reference genome is available. ## Standard Workflow 1. Run SPAdes: `spades.py -1 R1.fastq.gz -2 R2.fastq.gz -o assembly_output --careful` 2. Check assembly stats: look at scaffolds.fasta or contigs.fasta 3. Use assembled genome as reference for read mapping ## Key Decisions - Use `--careful` flag for bacterial genomes to reduce misassemblies - For small bacterial genomes, default k-mer sizes work well - Output scaf
# Variant Calling Skill ## When to Use Use when calling SNPs and indels from aligned BAM files against a reference. ## Standard Workflow 1. Mark duplicates (optional): `samtools markdup` 2. Call variants with freebayes: `freebayes -f reference.fasta -p 1 sample.bam > variants.vcf` OR with bcftools: `bcftools mpileup -f ref.fa sample.bam | bcftools call -mv -Oz -o variants.vcf.gz` 3. Filter variants: `bcftools filter -s LowQual -e 'QUAL<20' variants.vcf` ## Key Decisions - For haploid organ
# BWA Mapping Skill ## When to Use Use when mapping paired-end or single-end sequencing reads to a reference genome. ## Standard Workflow 1. Index reference genome: `bwa index reference.fasta` 2. Map reads: `bwa mem -t <threads> reference.fasta R1.fastq.gz R2.fastq.gz > aligned.sam` 3. Convert to sorted BAM: `samtools sort -o aligned.sorted.bam aligned.sam` 4. Index BAM: `samtools index aligned.sorted.bam` ## Key Decisions - Use BWA-MEM for reads >70bp (standard for Illumina HiSeq) - Add read
Differential gene expression analysis with PyDESeq2. Design matrices, contrasts, multiple testing correction, volcano plots.
# Kraken2 - Taxonomic Classification ## When to Use Use Kraken2 for taxonomic classification of metagenomic sequencing reads against a reference database. ## Standard Workflow 1. Install: `conda install -c bioconda kraken2` or `brew install kraken2` 2. Run classification: `kraken2 --db <db_path> --paired <R1.fastq.gz> <R2.fastq.gz> --output <output.txt> --report <report.txt> --threads <N>` 3. Key output files: - output.txt: per-read classification - report.txt: summary report with read c
Single-cell RNA-seq analysis with Scanpy. QC, normalization, clustering, visualization, marker gene identification.
# SnpEff Skill ## When to Use Use for annotating variants with gene names, functional effects, and impact predictions. ## Standard Workflow 1. Build custom database from GFF/GBK + genome: - Create snpEff config entry - Place genome and genes files in data directory - Run: `snpEff build -gff3 custom_db` 2. Annotate VCF: `snpEff ann custom_db variants.vcf > annotated.vcf` 3. Parse annotations from ANN field in VCF ## Key Decisions - For custom/assembled genomes: build a local SnpEff da
Critically review AI-agent-conducted scientific analyses for correctness, rigor, and completeness. Use this skill whenever an analysis session has completed and needs validation, when a user asks to "review," "validate," "check," or "audit" a computational analysis, or when an agent pipeline produces scientific results that require quality control before reporting. Also trigger when the user references an execution trace, notebook, or conversation history from a prior analysis session. This skill should run as the final step of any autonomous scientific analysis pipeline.
End-to-end single-cell RNA-seq analysis including data loading, QC, integration, clustering, cell type annotation, and differential expression.