.autolab/acquired_skills/differential-expression/SKILL.md
Differential gene expression analysis with PyDESeq2. Design matrices, contrasts, multiple testing correction, volcano plots.
npx skillsauth add albert-ying/autonomous-lab differential-expressionInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
from pydeseq2.dds import DeseqDataSet
from pydeseq2.ds import DeseqStats
import pandas as pd
# counts: genes x samples DataFrame; metadata: samples DataFrame
dds = DeseqDataSet(
counts=counts_df,
metadata=metadata_df,
design="~batch + condition", # batch correction built into design
)
dds.deseq2()
# Extract results for a contrast
stat_res = DeseqStats(dds, contrast=["condition", "treated", "control"])
stat_res.summary()
results_df = stat_res.results_df
# Filter significant genes
sig = results_df[
(results_df["padj"] < 0.05) & (results_df["log2FoldChange"].abs() > 1)
]
import matplotlib.pyplot as plt
import numpy as np
df = results_df.copy()
df["-log10p"] = -np.log10(df["padj"].clip(lower=1e-300))
fig, ax = plt.subplots(figsize=(8, 6))
colors = np.where(
(df["padj"] < 0.05) & (df["log2FoldChange"] > 1), "red",
np.where((df["padj"] < 0.05) & (df["log2FoldChange"] < -1), "blue", "gray")
)
ax.scatter(df["log2FoldChange"], df["-log10p"], c=colors, s=8, alpha=0.5)
ax.axhline(-np.log10(0.05), ls="--", color="gray", lw=0.8)
ax.axvline(-1, ls="--", color="gray", lw=0.8)
ax.axvline(1, ls="--", color="gray", lw=0.8)
ax.set_xlabel("log2 Fold Change")
ax.set_ylabel("-log10 adjusted p-value")
~patient + treatment.~genotype + treatment + genotype:treatment.development
Critically review AI-agent-conducted scientific analyses for correctness, rigor, and completeness. Use this skill whenever an analysis session has completed and needs validation, when a user asks to "review," "validate," "check," or "audit" a computational analysis, or when an agent pipeline produces scientific results that require quality control before reporting. Also trigger when the user references an execution trace, notebook, or conversation history from a prior analysis session. This skill should run as the final step of any autonomous scientific analysis pipeline.
tools
# Variant Calling Skill ## When to Use Use when calling SNPs and indels from aligned BAM files against a reference. ## Standard Workflow 1. Mark duplicates (optional): `samtools markdup` 2. Call variants with freebayes: `freebayes -f reference.fasta -p 1 sample.bam > variants.vcf` OR with bcftools: `bcftools mpileup -f ref.fa sample.bam | bcftools call -mv -Oz -o variants.vcf.gz` 3. Filter variants: `bcftools filter -s LowQual -e 'QUAL<20' variants.vcf` ## Key Decisions - For haploid organ
tools
# Trimmomatic - Read Quality Trimming ## When to Use Use Trimmomatic to trim adapter sequences and low-quality bases from Illumina sequencing reads. ## Standard Workflow 1. Install: `conda install -c bioconda trimmomatic` 2. Run: `trimmomatic PE <input_R1.fastq.gz> <input_R2.fastq.gz> <output_R1_paired.fastq.gz> <output_R1_unpaired.fastq.gz> <output_R2_paired.fastq.gz> <output_R2_unpaired.fastq.gz> ILLUMINACLIP:<adapters.fa>:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36` ## Key Pa
testing
# SPAdes Assembly Skill ## When to Use Use for de novo genome assembly when no reference genome is available. ## Standard Workflow 1. Run SPAdes: `spades.py -1 R1.fastq.gz -2 R2.fastq.gz -o assembly_output --careful` 2. Check assembly stats: look at scaffolds.fasta or contigs.fasta 3. Use assembled genome as reference for read mapping ## Key Decisions - Use `--careful` flag for bacterial genomes to reduce misassemblies - For small bacterial genomes, default k-mer sizes work well - Output scaf