skills/23-Learning-Bayesian-Statistics-baygent-skills/bayesian-workflow/SKILL.md
Opinionated Bayesian modeling workflow with PyMC and ArviZ. Contains critical guardrails (nutpie sampler, prior/posterior predictive checks, LOO-PIT calibration, prior sensitivity checks, 94% HDI, non-centered parameterizations, reproducible seeds) that agents won't apply unprompted — always consult before writing Bayesian model code. Trigger on: building probabilistic/Bayesian models, prior elicitation, MCMC inference, convergence diagnostics (divergences, R-hat, ESS), model comparison (LOO-CV, ELPD, stacking weights), hierarchical/multilevel models, count regressions, logistic regression with uncertainty, prior sensitivity analysis, reporting Bayesian results, or mentions of PyMC, ArviZ, InferenceData, credible intervals, posterior distributions, shrinkage, uncertainty quantification. Also trigger for model comparison, diagnosing sampling problems, choosing priors, or presenting stats to non-technical audiences.
npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research bayesian-workflowInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Every Bayesian analysis follows this sequence. Do not skip steps -- especially model criticism.
pm.sample_prior_predictive(). Verify priors produce plausible data ranges before fittingpm.sample(nuts_sampler="nutpie"). Always use nutpie for speed (the nutpie python package provides cutting-edge sampling). Don't hardcode the number of chains — let the sampler pick the best default for the platform.arviz_stats.diagnose(idata) as the first check (requires arviz-stats >= 1.0.0). It covers R-hat, ESS, divergences, tree depth, and E-BFMI in one call. See references/diagnostics.mdpsense_summary(idata) to verify conclusions are robust to prior choices. Visualize with plot_psense_dist(idata) from arviz_plots. Requires log_likelihood and log_prior in the InferenceData — compute them after sampling if needed. See references/sensitivity.mdPrefer conda-forge / mamba-forge to install PyMC and its dependencies — pip can cause issues with compiled backends (nutpie, JAX). Example:
mamba install -c conda-forge pymc nutpie arviz arviz-stats preliz
import pymc as pm
import arviz as az
import numpy as np
RANDOM_SEED = sum(map(ord, "churn-logistic-v1"))
rng = np.random.default_rng(RANDOM_SEED)
# always use dimensions and coordinates in PyMC models
with pm.Model(coords=coords) as model:
# use Data containers when working on a PyMC model
data = pm.Data("data", df["y"].to_numpy(), dims="obs")
# --- Priors ---
# Always document WHY each prior was chosen
mu = pm.Normal("mu", mu=0, sigma=10) # Weakly informative: allows wide range
# --- Data model ---
pm.Normal("obs", mu=mu, sigma=1, observed=data, dims="obs")
# --- Prior predictive check ---
prior_pred = pm.sample_prior_predictive(random_seed=rng)
# --- Inference ---
idata = pm.sample(nuts_sampler="nutpie", random_seed=rng)
idata.extend(prior_pred)
# --- Posterior predictive check ---
idata.extend(pm.sample_posterior_predictive(idata, random_seed=rng))
# --- Compute log-likelihood and log-prior for sensitivity checks & LOO ---
pm.compute_log_likelihood(idata, model=model)
pm.compute_log_prior(idata, model=model)
# --- Save immediately after sampling ---
# Late crashes can destroy valid results. Save to disk before any post-processing.
idata.to_netcdf("model_output.nc")
plot_ppc_pit for this — it handles all data types (continuous, binary, count) correctly. See references/model-criticism.md.arviz_stats.diagnose(idata) as the first diagnostic on every model (arviz-stats >= 1.0.0). It checks R-hat, ESS, divergences, tree depth saturation, and E-BFMI in one call. Follow up with az.plot_trace(idata, kind="rank_vlines") for visual inspection.pm.sample() without specifying chains.42. Instead, derive a seed from the analysis name: RANDOM_SEED = sum(map(ord, "my-analysis-name")). Pass it to pm.sample(random_seed=rng), pm.sample_prior_predictive(random_seed=rng), and numpy via rng = np.random.default_rng(RANDOM_SEED).idata.to_netcdf("model_output.nc"). Late crashes or kernel restarts can destroy valid MCMC results — save before any post-processing.InferenceData and DataTree objects are backed by xarray — use xarray's labeled indexing (.sel(), .mean(dim=...), etc.) instead of converting to numpy arrays. This preserves dimension labels, avoids shape bugs, and makes code more readable. Fall back to numpy only when xarray can't do what you need.analysis_notes.md or similar) that interprets the results — what the diagnostics mean, what the posteriors tell us, what the calibration plots show. Code without interpretation is incomplete.P(Y=k|x) = (1/S) Σ P(Y=k|x,θₛ). This is the mean, not the median. The median does not correspond to the posterior predictive distribution, can violate probability coherence (probabilities may not sum to 1), and biases calibration due to Jensen's inequality. In code: use np.mean(probs, axis=sample_axis), never np.median(...).pm.set_data() + pm.sample_posterior_predictive() for out-of-sample predictions. Don't manually extract posterior samples and recompute predictions — let PyMC propagate uncertainty properly. Define predictors as pm.Data(...) during model building, then swap in new data:# After fitting the model:
with model:
pm.set_data({"X": X_new, "group_idx": group_idx_new})
oos_preds = pm.sample_posterior_predictive(idata, predictions=True, random_seed=rng)
az.plot_pair() to check for strong posterior correlations between components. If correlation is near ±1, the components are not separately identifiable — either merge them or restructure the data.| Problem | Data model | Typical priors | Reference |
|---|---|---|---|
| Continuous outcome | Normal / StudentT | Normal, Gamma avoiding 0 for positive-constrained parameters | references/priors.md |
| Binary outcome | Bernoulli or Binomial if aggregated, with logit inverse-link | Normal(0, 1.5) on coeffs | references/priors.md |
| Count data | Poisson / NegBinomial | Gamma on rate, avoiding 0 | references/priors.md |
| Count data with excess zeros | ZeroInflatedPoisson / ZeroInflatedNegBinomial | Gamma on rate; Beta or Normal+logit on zero-inflation prob | references/priors.md |
| Positive count data (no zeros) | Hurdle Poisson / Hurdle NegBinomial | Separate zero-gate (Bernoulli) and count (Truncated) components | references/priors.md |
| Ordinal outcome | OrderedLogistic (cumulative link) | Normal on coeffs; Normal with ordered transform on cutpoints | references/priors.md |
| Censored data (survival, limits of detection) | pm.Censored(dist, lower, upper) | Same as uncensored, applied to underlying distribution | references/priors.md |
| Truncated data | pm.Truncated(dist, lower, upper) | Same as underlying distribution | references/priors.md |
| High-dimensional / sparse regression | Normal / StudentT with sparsity prior on coefficients | Regularized Horseshoe or R2-D2 on coeffs | references/priors.md |
| Hierarchical / multilevel | Varies | See partial pooling pattern | references/hierarchical.md |
| Time series | state space models / Gaussian Processes | Problem-specific | references/priors.md |
Run diagnose_model.py after sampling to get a structured convergence + diagnostics report:
python scripts/diagnose_model.py --idata path/to/inference_data.nc
Run calibration_check.py to generate calibration plots:
python scripts/calibration_check.py --idata path/to/inference_data.nc
See scripts/ for all available utilities.
These are battle-tested lessons that save hours of debugging:
idata_kwargs for log_likelihood and log_prior. Always compute them explicitly after sampling: pm.compute_log_likelihood(idata, model=model) (needed for LOO-CV) and pm.compute_log_prior(idata, model=model) (needed for prior sensitivity checks). Don't assume they're stored automatically.az.plot_khat() requires the LOO object, not InferenceData. Pass the output of az.loo(idata, pointwise=True) to it.HalfCauchy, HalfFlat) cause funnels in hierarchical models. Use Gamma(2, ...) or Exponential — these avoid the near-zero region that creates sampling problems. If there's no group-level variation to detect, you don't need the hierarchy.if x > 0) don't work inside PyMC. Use pm.math.switch or pytensor.tensor.where instead.target_accept=0.95 or higher. If you see divergences with a horseshoe model, this is almost certainly the cause.np.median on posterior predictive probabilities is a silent bug. It does not produce the Bayesian predictive distribution and can yield probabilities that don't sum to 1 across categories. Always use np.mean over the posterior samples dimension.| Symptom | Likely cause | Fix |
|---|---|---|
| Divergences | Posterior geometry issue | Reparameterize (non-centered), increase target_accept to 0.95-0.99 |
| Low ESS | High autocorrelation | More tuning steps, reparameterize, reduce correlations |
| R-hat > 1.01 | Chains haven't mixed | More draws, better initialization, check for multimodality |
| Prior pred. looks wrong | Bad priors | Tighten or shift priors, use domain knowledge |
| Post. pred. misses data | Model misspecification | Add complexity (varying slopes, different data model, interaction terms) |
| log_likelihood missing | nutpie doesn't auto-store it | Call pm.compute_log_likelihood(idata, model=model) after sampling |
| Slow model | Large Deterministics or recompilation | Profile with model.profile(model.logp()), avoid large Deterministic arrays |
| Slow to initialize / poor warmup | Bad starting point | Try init="adapt_diag_grad" in pm.sample(), or run pmx.fit(method="pathfinder") first (import pymc_extras as pmx) and pass its estimates as initvals |
| Prior sensitivity flag | Prior-data conflict or strong prior | Check psense_summary(idata) — see references/sensitivity.md. Justify or revise the flagged prior |
development
Conduct rigorous thematic analysis (TA) of qualitative data following Braun and Clarke's (2006) six-phase framework. Use whenever the user mentions 'thematic analysis', 'TA', 'Braun and Clarke', 'qualitative coding', 'identifying themes', or asks for help analysing interviews, focus groups, open-ended survey responses, or transcripts to identify patterns. Also trigger for questions about inductive vs theoretical coding, semantic vs latent themes, essentialist vs constructionist epistemology, building a thematic map, or writing up a qualitative findings section. Covers all six phases, the four upfront analytic decisions, the 15-point quality checklist, and the five common pitfalls. Produces a Word document write-up and an annotated thematic map. Does NOT cover IPA, grounded theory, discourse analysis, conversation analysis, or narrative analysis — use a different method for those.
development
Guide users through writing a systematic literature review (SLR) following the PRISMA 2020 framework. Use this skill whenever the user mentions 'systematic review', 'systematic literature review', 'SLR', 'PRISMA', 'PRISMA 2020', 'PRISMA flow diagram', 'PRISMA checklist', or asks for help writing, structuring, or auditing a literature review that follows reporting guidelines. Also trigger when the user asks about inclusion/exclusion criteria for a review, search strategies for databases like Scopus/WoS/PubMed, study selection processes, risk of bias assessment, or narrative synthesis for a review paper. This skill covers the full PRISMA 2020 checklist (27 items), produces a Word document manuscript in strict journal article format, generates an annotated PRISMA flow diagram, and enforces APA 7th Edition referencing throughout. It does NOT cover meta-analysis or statistical pooling. By Chuah Kee Man.
testing
Performs placebo-in-time sensitivity analysis with hierarchical null model and optional Bayesian assurance. Use when checking model robustness, verifying lack of pre-intervention effects, or estimating study power.
data-ai
Fit, summarize, plot, and interpret a chosen CausalPy experiment. Use after the causal method has been selected, including when configuring PyMC/sklearn models and scale-aware custom priors.