skills/systems-biology-multiomics/cobrapy-metabolic-modeling/SKILL.md
Constraint-based (COBRA) analysis of genome-scale metabolic models: FBA, FVA, knockouts, flux sampling, production envelopes, gapfilling, media optimization. Use for strain design, essential gene ID, flux analysis. For kinetic modeling use tellurium; for visualization use Escher.
npx skillsauth add jaechang-hits/sciagent-skills cobrapy-metabolic-modelingInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
COBRApy is a Python package for constraint-based reconstruction and analysis (COBRA) of genome-scale metabolic models. It provides flux balance analysis (FBA), flux variability analysis (FVA), gene and reaction knockout screens, flux sampling, production envelopes, gapfilling, and media optimization on SBML-format metabolic networks.
cobra (includes GLPK solver), numpy, pandaspip install cobra
from cobra.io import load_model
model = load_model("textbook") # E. coli core model
print(f"Model: {model.id} — {len(model.reactions)} rxns, {len(model.metabolites)} mets, {len(model.genes)} genes")
solution = model.optimize()
print(f"Growth rate: {solution.objective_value:.4f} /h")
print(f"Status: {solution.status}")
# Model: e_coli_core — 95 rxns, 72 mets, 137 genes
# Growth rate: 0.8739 /h
# Status: optimal
Load bundled models and read/write standard formats.
from cobra.io import load_model, read_sbml_model, write_sbml_model, load_json_model, save_json_model
# Bundled: "textbook" (95 rxns), "ecoli" (2583 rxns), "salmonella"
model = load_model("textbook")
# model = read_sbml_model("my_model.xml") # from SBML file
# model = load_json_model("my_model.json") # from JSON file
write_sbml_model(model, "output_model.xml")
save_json_model(model, "output_model.json")
print(f"Saved model: {model.id}")
Access reactions, metabolites, and genes via DictList containers.
from cobra.io import load_model
model = load_model("textbook")
# Inspect a reaction
rxn = model.reactions.get_by_id("PFK")
print(f"Reaction: {rxn.id} — {rxn.name}")
print(f"Equation: {rxn.reaction}")
print(f"Bounds: {rxn.bounds}, GPR: {rxn.gene_reaction_rule}")
# Inspect a metabolite
met = model.metabolites.get_by_id("atp_c")
print(f"Metabolite: {met.id}, Formula: {met.formula}, Compartment: {met.compartment}")
# Query and list exchange reactions
atp_rxns = model.reactions.query("atp", attribute="name")
print(f"ATP-related reactions: {len(atp_rxns)}, Exchange reactions: {len(model.exchanges)}")
Predict optimal flux distributions by maximizing an objective.
from cobra.io import load_model
from cobra.flux_analysis import pfba
model = load_model("textbook")
# Standard FBA
solution = model.optimize()
print(f"Growth: {solution.objective_value:.4f} /h, Active fluxes: {(solution.fluxes.abs() > 1e-6).sum()}")
# Parsimonious FBA — same growth, minimal total flux
pfba_sol = pfba(model)
print(f"pFBA total flux: {pfba_sol.fluxes.abs().sum():.1f} vs standard: {solution.fluxes.abs().sum():.1f}")
# Change objective; slim_optimize for speed
from cobra.io import load_model
model = load_model("textbook")
with model:
model.objective = "ATPM"
print(f"Max ATPM flux: {model.optimize().objective_value:.2f}")
print(f"Growth (slim): {model.slim_optimize():.4f}") # no flux vector, faster
Determine feasible flux ranges at or near optimality.
from cobra.io import load_model
from cobra.flux_analysis import flux_variability_analysis
model = load_model("textbook")
fva = flux_variability_analysis(model, fraction_of_optimum=1.0)
fva_90 = flux_variability_analysis(model, fraction_of_optimum=0.9)
fva["range"] = fva["maximum"] - fva["minimum"]
fva_90["range"] = fva_90["maximum"] - fva_90["minimum"]
print(f"Mean range at 100%: {fva['range'].mean():.2f}, at 90%: {fva_90['range'].mean():.2f}")
# Loopless FVA on specific reactions
from cobra.io import load_model
from cobra.flux_analysis import flux_variability_analysis
model = load_model("textbook")
fva_ll = flux_variability_analysis(
model, loopless=True, reaction_list=["PFK", "PGI", "FBA", "TPI", "GAPD"],
)
print(fva_ll)
Screen for essential genes/reactions via knockout simulations.
from cobra.io import load_model
from cobra.flux_analysis import single_gene_deletion, double_gene_deletion
model = load_model("textbook")
wt_growth = model.slim_optimize()
# Single gene deletions
gene_results = single_gene_deletion(model)
gene_results["growth_fraction"] = gene_results["growth"] / wt_growth
essential = gene_results[gene_results["growth_fraction"] < 0.01]
print(f"Essential genes: {len(essential)} / {len(model.genes)}")
# Double deletions (synthetic lethality) — use multiprocessing
double_results = double_gene_deletion(model, processes=4)
print(f"Double deletion results: {double_results.shape}")
Modify nutrient availability and compute minimal media.
from cobra.io import load_model
from cobra.medium import minimal_medium
model = load_model("textbook")
# View current medium
for rxn_id, flux in sorted(model.medium.items()):
print(f" {rxn_id}: {flux}")
# Anaerobic switch via context manager
with model:
medium = model.medium
medium["EX_o2_e"] = 0.0
model.medium = medium
print(f"Anaerobic growth: {model.slim_optimize():.4f} /h")
# Minimal medium
min_med = minimal_medium(model, minimize_components=True, open_exchanges=True)
print(f"Minimal medium: {len(min_med)} components")
Sample feasible flux distributions for variability analysis.
from cobra.io import load_model
from cobra.sampling import sample
model = load_model("textbook")
samples = sample(model, n=500, method="optgp")
print(f"Samples shape: {samples.shape}") # (500, n_reactions)
print(f"PFK flux: mean={samples['PFK'].mean():.2f}, std={samples['PFK'].std():.2f}")
Compute phenotype phase planes and fill model gaps.
from cobra.io import load_model
from cobra.flux_analysis import production_envelope
model = load_model("textbook")
envelope = production_envelope(
model,
reactions=model.reactions.get_by_id("EX_ac_e"),
carbon_sources=model.reactions.get_by_id("EX_glc__D_e"),
)
print(f"Envelope: {len(envelope)} points")
print(envelope[["flux_minimum", "flux_maximum", "carbon_yield_minimum", "carbon_yield_maximum"]].head())
# Gapfilling: restore growth after reaction removal
from cobra.io import load_model
from cobra.flux_analysis.gapfilling import gapfill
model = load_model("textbook")
universal = load_model("textbook") # In practice, use a universal reaction DB
with model:
model.remove_reactions([model.reactions.get_by_id("PFK")])
print(f"Growth after removing PFK: {model.slim_optimize():.4f}")
for rxn in gapfill(model, universal)[0]:
print(f" Gapfill suggests: {rxn.id}")
Reactions, metabolites, and genes are stored in DictList — ordered, indexable, and accessible by ID.
rxn = model.reactions[0] # by index
rxn = model.reactions.get_by_id("PFK") # by ID
matches = model.reactions.query("phospho") # keyword search
EX_ prefix reactions represent system boundary. Positive flux = secretion; negative = uptake. Managed via model.medium dict.
Boolean expressions linking genes to reactions: (b0726 and b0727) or b1234. Gene knockout propagates through GPR logic.
with model: snapshots state and reverts all changes on exit (bounds, objective, medium, knockouts). Nesting supported.
Goal: Identify essential, growth-reducing, and neutral genes.
from cobra.io import load_model
from cobra.flux_analysis import single_gene_deletion
model = load_model("textbook")
wt_growth = model.slim_optimize()
results = single_gene_deletion(model)
results["growth_fraction"] = results["growth"] / wt_growth
essential = results[results["growth_fraction"] < 0.01]
reduced = results[(results["growth_fraction"] >= 0.01) & (results["growth_fraction"] < 0.9)]
neutral = results[results["growth_fraction"] >= 0.9]
print(f"Essential: {len(essential)}, Reduced: {len(reduced)}, Neutral: {len(neutral)}")
for idx in essential.index:
print(f" Essential gene: {list(idx)[0]}")
Goal: Find minimal medium at different growth targets; compare aerobic vs anaerobic.
from cobra.io import load_model
from cobra.medium import minimal_medium
import pandas as pd
model = load_model("textbook")
results = []
for frac in [0.1, 0.5, 0.8, 1.0]:
with model:
target = model.slim_optimize() * frac
model.reactions.get_by_id("Biomass_Ecoli_core").lower_bound = target
try:
mm = minimal_medium(model, minimize_components=True, open_exchanges=True)
results.append({"growth_frac": frac, "n_components": len(mm)})
except Exception:
results.append({"growth_frac": frac, "n_components": None})
print(pd.DataFrame(results).to_string(index=False))
for label, o2 in [("Aerobic", 1000.0), ("Anaerobic", 0.0)]:
with model:
medium = model.medium
medium["EX_o2_e"] = o2
model.medium = medium
print(f"{label} growth: {model.slim_optimize():.4f} /h")
Goal: Design a strain with maximized target metabolite production. Combines modules 3, 5, and 8.
| Parameter | Module/Function | Default | Range / Options | Effect |
|-----------|----------------|---------|-----------------|--------|
| fraction_of_optimum | flux_variability_analysis | 1.0 | 0.0-1.0 | Fraction of max objective to maintain; lower = wider flux ranges |
| loopless | flux_variability_analysis | False | True, False | Eliminate thermodynamically infeasible loops; slower |
| method | sample | "optgp" | "optgp", "achr" | Sampling algorithm; optgp supports parallelism |
| n | sample | required | 100-10000 | Number of flux samples to draw |
| processes | sample, double_gene_deletion | 1 | 1-N_cores | Parallel worker processes |
| minimize_components | minimal_medium | False | True, False | True = fewest nutrients (MILP); False = minimize total flux |
| open_exchanges | minimal_medium | False | True, False | Allow all exchanges as nutrient candidates |
| carbon_sources | production_envelope | None | Reaction object | Compute carbon yield alongside flux envelope |
| thinning | sample | 100 | 1-1000 | Steps between kept samples; higher = less correlated |
Use context managers for temporary changes: with model: reverts all modifications on exit.
with model:
model.reactions.PFK.knock_out()
print(model.slim_optimize()) # modified
# model is restored here
Validate with slim_optimize() before analysis: Quick feasibility check before expensive operations (FVA, sampling).
Check solution.status after optimization: Always verify "optimal" before interpreting fluxes.
Use loopless FVA when thermodynamic feasibility matters: Standard FVA can include infeasible internal cycles that inflate flux ranges.
Parallelize expensive operations: Sampling and double deletions support processes parameter.
Prefer SBML for model exchange: Community standard supported by all COBRA tools.
Use slim_optimize() in loops: Skips full flux vector construction, significantly faster for screening.
Validate flux samples: Use sampler.validate(samples) to check stoichiometric and bound constraints.
from cobra.io import load_model
model = load_model("textbook")
print("Glucose_uptake | Growth_rate")
for glc_uptake in [1, 2, 5, 10, 15, 20]:
with model:
model.reactions.get_by_id("EX_glc__D_e").lower_bound = -glc_uptake
growth = model.slim_optimize()
print(f" {glc_uptake:>13} | {growth:.4f}")
from cobra.io import load_model
import pandas as pd
model = load_model("textbook")
conditions = [
{"name": "Rich aerobic", "EX_o2_e": 1000, "EX_glc__D_e": 10},
{"name": "Anaerobic", "EX_o2_e": 0, "EX_glc__D_e": 10},
{"name": "Low glucose", "EX_o2_e": 1000, "EX_glc__D_e": 1},
]
results = []
for c in conditions:
with model:
medium = model.medium
medium["EX_o2_e"], medium["EX_glc__D_e"] = c["EX_o2_e"], c["EX_glc__D_e"]
model.medium = medium
results.append({"condition": c["name"], "growth": round(model.slim_optimize(), 4)})
print(pd.DataFrame(results).to_string(index=False))
from cobra.io import load_model
from cobra.flux_analysis import find_blocked_reactions
model = load_model("textbook")
# Feasibility, mass balance, dead-ends, blocked reactions
print(f"Growth feasible: {model.slim_optimize() > 0}")
print(f"Missing formula: {sum(1 for m in model.metabolites if m.formula is None)}")
print(f"Dead-end metabolites: {sum(1 for m in model.metabolites if len(m.reactions) == 1)}")
print(f"Blocked reactions: {len(find_blocked_reactions(model))} / {len(model.reactions)}")
| Problem | Cause | Solution |
|---------|-------|----------|
| solution.status == "infeasible" | Constraints cannot be simultaneously satisfied | Check medium has required nutrients; verify reaction bounds; use model.medium to restore defaults |
| solution.status == "unbounded" | No upper bound on fluxes | Set finite upper bounds on exchange reactions |
| Very slow optimization | Large model + default GLPK solver | Install CPLEX or Gurobi: model.solver = "cplex" |
| ValueError setting bounds | lower_bound > upper_bound temporarily | Set as tuple: rxn.bounds = (new_lb, new_ub) |
| Gene deletion returns NaN | Knockout makes model infeasible | Expected for essential genes; classify as essential |
| IOError reading SBML | Invalid SBML or missing namespace | Validate at sbml.org; try cobra.io.sbml.validate_sbml_model(path) |
| Flux samples fail validation | Numerical solver tolerance | Increase thinning parameter; try method="achr" |
1 reference file:
references/api_workflows.md — Consolidates API quick reference and advanced workflows. Covers: detailed function signatures, solver configuration (GLPK/CPLEX/Gurobi), advanced analysis (find_blocked_reactions, find_essential_genes/find_essential_reactions), model manipulation (adding reactions/metabolites/genes), flux sample validation, and 5 workflow examples (knockout with visualization, media design, flux space exploration, production strain design, model validation). Relocated inline: basic FBA/FVA/deletion/sampling (Core API modules 3-7). Omitted: geometric FBA internals, MIP gap configuration — consult COBRApy docs.tools
Fast short-read DNA aligner for WGS/WES/ChIP-seq. 2× faster BWA-MEM successor; outputs SAM/BAM with read group headers for GATK. Primary plus supplementary records for chimeric reads. Use STAR for RNA-seq splice-aware alignment; Bowtie2 is a comparable alternative.
tools
smina molecular docking CLI. AutoDock Vina fork with customizable scoring functions, native SDF/MOL2/PDB ligand input, autoboxing, local energy minimization, and per-atom score breakdowns. Pipeline: receptor PDBQT prep -> ligand prep (RDKit/OpenBabel) -> dock via autobox or explicit grid -> rescore/minimize with custom scoring -> rank poses by affinity. Choose smina over Vina when you need custom scoring terms (--custom_scoring), local optimization of an existing pose (--local_only), per-atom contributions (--atom_term_data), or SDF/MOL2 ligands without manual PDBQT conversion. For unknown binding sites use diffdock-blind-docking; for the Python-bindings/Vinardo workflow use autodock-vina-docking.
development
mdtraj molecular dynamics trajectory analysis (Python). Reads DCD/XTC/TRR/NetCDF/H5/PDB topologies and trajectories; computes RMSD vs time, radius of gyration, per-residue RMSF, residue-residue contact frequency maps, phi/psi torsions for Ramachandran plots (general + Gly/Pro), and 8-state DSSP secondary structure. Modules: trajectory I/O, geometry (distances/angles/dihedrals), structural analysis (RMSD/Rg/RMSF/SASA), contacts, hydrogen bonds, secondary structure (DSSP), NMR observables. For broader atom-selection grammar use mdanalysis-trajectory; for running MD simulations use OpenMM/GROMACS.
development
Programmatic PubMed access via NCBI E-utilities REST API. Covers Boolean/MeSH queries, field-tagged search, endpoints (ESearch, EFetch, ESummary, EPost, ELink), history server for batches, citation matching, systematic review strategies. Use for biomedical literature search or automated pipelines.