skills/bayesian-stats/SKILL.md
Convert frequentist statistical tests into their Bayesian equivalents. Provides mappings, code snippets, interpretation guides, and best practices.
npx skillsauth add AMindToThink/claude-code-settings bayesian-statsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
When the user invokes /bayesian-stats, help them convert frequentist statistical tests into Bayesian equivalents.
If an argument is provided (e.g., /bayesian-stats t-test), look up that specific test. Otherwise, ask which frequentist test they want to convert.
| Frequentist Test | Bayesian Equivalent | Python Library |
|-----------------|--------------------|--------------|
| Paired t-test | Bayesian paired t-test with JZS prior → BF₁₀ | pingouin.bayesian_ttest(x, y, paired=True) |
| Independent t-test | Bayesian independent t-test with JZS prior → BF₁₀ | pingouin.bayesian_ttest(x, y, paired=False) |
| Wilcoxon signed-rank | Bayesian paired t-test (robust alternative) or Bayesian sign test via PyMC | pingouin for approximate BF, pymc for full model |
| Fisher's exact / Chi-squared | Beta-Binomial model with Beta(1,1) priors on each group's success rate | Analytical or pymc: pm.Beta("p", 1, 1) per group |
| Mixed-effects logistic regression | Bayesian mixed-effects model | bambi: bmb.Model("y ~ condition + (1|question)", data, family="bernoulli") |
| ANOVA / F-test | Bayesian ANOVA | pingouin.bayesian_anova(data, dv, between) or bambi |
| Pearson correlation | Bayesian correlation | pingouin.bayesian_corr(x, y) |
| Bootstrap CI | Posterior credible interval from MCMC | pymc model → arviz.summary() for HDI |
import pingouin as pg
bf = pg.bayesian_ttest(x, y, paired=True, r=0.707) # JZS prior, Cauchy scale r=√2/2
print(f"BF₁₀ = {bf:.3f}")
import pymc as pm
import arviz as az
with pm.Model():
p_a = pm.Beta("p_a", 1, 1) # Condition A success rate
p_b = pm.Beta("p_b", 1, 1) # Condition B success rate
pm.Binomial("obs_a", n=n_a, p=p_a, observed=k_a)
pm.Binomial("obs_b", n=n_b, p=p_b, observed=k_b)
delta = pm.Deterministic("delta", p_b - p_a)
trace = pm.sample(4000)
az.summary(trace, var_names=["delta"], hdi_prob=0.95)
az.plot_posterior(trace, var_names=["delta"], ref_val=0)
import bambi as bmb
import arviz as az
model = bmb.Model("correct ~ advice_source * question_category + (1|question_id)", data, family="bernoulli")
results = model.fit(draws=4000)
az.summary(results, var_names=["advice_source", "question_category", "advice_source:question_category"])
| BF₁₀ | Evidence | |-------|----------| | > 100 | Extreme evidence for H₁ | | 30–100 | Very strong evidence for H₁ | | 10–30 | Strong evidence for H₁ | | 3–10 | Moderate evidence for H₁ | | 1–3 | Anecdotal evidence for H₁ | | 1/3–1 | Anecdotal evidence for H₀ | | 1/10–1/3 | Moderate evidence for H₀ | | 1/30–1/10 | Strong evidence for H₀ | | < 1/30 | Very strong evidence for H₀ |
Key advantage: BF < 1/3 provides evidence for the null, not just "failure to reject." This is impossible with p-values.
uv add pingouin pymc bambi arviz
development
Use when the user asks to check, audit, or improve a website or web project for accessibility (a11y), WCAG compliance, screen reader support, keyboard navigation, color contrast, or alt text. Triggers a plan-mode investigation against the TeachAccess design and code checklists, then implements approved fixes.
development
--- name: make-anonymous-branch description: Use when preparing a research repo for double-blind submission via anonymous.4open.science (ICML/NeurIPS/ICLR/workshop). Builds a single `anon-submission` branch with code+data+paper, scrubs identity leaks (author names, home paths, emails, wandb metadata, PDF author fields), patches LaTeX for pdf.js compatibility, and leaves `main` untouched. Triggers: "make an anonymous branch", "anonymize my repo for X submission", "set up anonymous.4open.science",
development
Translate math (formulas, estimators, algorithms) into code so the implementation faithfully matches what the source actually specifies. Use when writing code from a formula, reviewing an LLM-generated implementation of a formula, debugging a numerical mismatch with a paper, designing a new metric/estimator, or refactoring an existing math-heavy computation. Especially load-bearing whenever aggregation operators (sums, means, expectations, products, geometric means) appear over indices that can be reordered, or whenever the same English label can refer to multiple non-equivalent estimators (e.g. ratio-of-means vs mean-of-ratios, micro-average vs macro-average, sample-weighted vs unweighted). Prevents the failure mode where a code path silently implements the wrong estimator under the same name as the intended one.
development
Use when the user asks to review, find, summarize, or check Claude Code chat transcripts from a past date or time range ("review my chats from May 1st", "what was I working on yesterday", "any unfinished sessions this week"). Reads transcripts under `~/.claude/projects/`, handles local-time vs UTC correctly so late-evening sessions don't get dropped, and flags chats whose last assistant turn looks like an unanswered question.