skills/17-DAAF-Contribution-Community-daaf/dot-claude/skills/pyfixest/SKILL.md
Fast high-dimensional fixed effects: OLS, Poisson, IV with multi-way FE; DiD (TWFE, did2s, Sun-Abraham); clustered SEs; etable/coefplot/iplot. Use for FE regressions or DiD. For panel RE/between use linearmodels; for GLM without FE use statsmodels.
npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research pyfixestInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
pyfixest: fast high-dimensional fixed effects estimation for Python. Covers OLS, Poisson, and IV regression with multi-way fixed effects; difference-in-differences estimators (TWFE, did2s, lpdid, Sun-Abraham); clustered standard errors; wild bootstrap; and publication output (etable regression tables, coefplot, iplot event study plots). Use when running fixed effects regressions, difference-in-differences designs, Poisson count models with FE, or producing publication-ready regression tables. For panel random/between effects, use linearmodels; for GLM/time series without FE, use statsmodels.
Comprehensive skill for fixed effects regression, instrumental variables, and difference-in-differences estimation with pyfixest. Use decision trees below to find the right guidance, then load detailed references.
pyfixest is a Python implementation of the R fixest package (Berge, Butts, & McDermott, 2026):
|, IV after second |, multiple estimation via sw()/csw()etable() for regression tables, coefplot() and iplot() for coefficient visualizationThis skill targets pyfixest 0.40.0, the major release aligning with R fixest 0.13. Breaking changes from earlier versions:
"iid" — old code silently produces different SEsssc() arguments renamed: adj → k_adj, fixef_k → k_fixef, cluster_adj → G_adj, cluster_df → G_dffixef_rm default changed from "none" to "singleton" — singletons now dropped by defaultEach topic in ./references/ contains focused documentation:
| File | Purpose | When to Read |
|------|---------|--------------|
| quickstart.md | Installation, first regression, formula syntax | Starting with pyfixest |
| fixed-effects.md | Multi-way FE, SE types, clustering, wild bootstrap | FE models and inference |
| instrumental-variables.md | IV syntax, first stage, weak instruments | IV/2SLS estimation |
| difference-in-differences.md | TWFE, did2s, lpdid, Sun-Abraham, event studies | DiD designs |
| tables-and-plots.md | etable, coefplot, iplot, dtable | Reporting results |
| advanced-inference.md | Wild bootstrap, randomization inference, MHT corrections, Gelbach | Advanced statistical inference |
| integration.md | Multiple estimation, Poisson, GLM, marginaleffects, online learning | Advanced features |
| gotchas.md | Common errors, v0.40 breaking changes, fixest vs pyfixest | Debugging issues |
quickstart.md then fixed-effects.mdquickstart.md, then difference-in-differences.mdquickstart.md, then instrumental-variables.mdtables-and-plots.mdquickstart.md then gotchas.md| Skill | Relationship |
|-------|-------------|
| data-scientist | Methodology guidance — load for "why and when" behind methods |
| statsmodels | Complement for non-FE models: GLM, time series, diagnostics |
| linearmodels | Random effects, GMM, system estimation when pyfixest's FE-only approach is insufficient |
| svy | Survey-weighted regression with complex survey designs. pyfixest's clustered SEs account for within-group correlation but do NOT handle full survey design features (stratification, unequal probability weights, FPC). If your data comes from a complex probability survey, use svy for design-based inference |
| polars | Data preparation before estimation (convert to pandas before passing to pyfixest) |
| plotnine | Custom visualization beyond pyfixest's built-in plots |
What kind of regression?
├─ OLS with fixed effects → ./references/quickstart.md
├─ OLS without fixed effects → ./references/quickstart.md
├─ IV / 2SLS → ./references/instrumental-variables.md
├─ Poisson (count data) → ./references/integration.md
├─ Logit / Probit → ./references/integration.md
├─ Quantile regression → ./references/integration.md
└─ Multiple models at once → ./references/integration.md
DiD design?
├─ Simple 2x2 DiD (one treatment date) → ./references/difference-in-differences.md
├─ Staggered treatment timing → ./references/difference-in-differences.md
│ ├─ did2s (Gardner imputation) → ./references/difference-in-differences.md
│ ├─ Local projections DiD → ./references/difference-in-differences.md
│ └─ Sun-Abraham saturated → ./references/difference-in-differences.md
├─ Event study plot → ./references/difference-in-differences.md
├─ Visualize treatment patterns → ./references/difference-in-differences.md
└─ Parallel trends assessment → ./references/difference-in-differences.md
What inference?
├─ Heteroskedasticity-robust (HC1) → ./references/fixed-effects.md
├─ Clustered (one-way / two-way) → ./references/fixed-effects.md
├─ Few clusters (<20) → ./references/advanced-inference.md
│ └─ Wild cluster bootstrap → ./references/advanced-inference.md
├─ HAC / Newey-West → ./references/fixed-effects.md
├─ Randomization inference → ./references/advanced-inference.md
├─ Multiple hypothesis testing → ./references/advanced-inference.md
└─ Causal cluster variance (CCV) → ./references/advanced-inference.md
Presenting results?
├─ Regression table (multiple models) → ./references/tables-and-plots.md
├─ Coefficient plot → ./references/tables-and-plots.md
├─ Event study plot → ./references/tables-and-plots.md
├─ Descriptive statistics table → ./references/tables-and-plots.md
└─ LaTeX output → ./references/tables-and-plots.md
Having issues?
├─ Different results from old code → ./references/gotchas.md
├─ feglm with fixed effects error → ./references/gotchas.md
├─ numba installation problems → ./references/gotchas.md
├─ CRV3 memory issues → ./references/gotchas.md
├─ Poisson convergence → ./references/gotchas.md
├─ Formula parsing errors → ./references/gotchas.md
├─ R fixest vs pyfixest differences → ./references/gotchas.md
└─ Singleton warnings → ./references/gotchas.md
Important: In data research pipelines (see CLAUDE.md), pyfixest regressions are executed through script files, not interactively. This ensures auditability and reproducibility.
The pattern:
scripts/stage8_analysis/{step}_{task-name}.pyClosely read agent_reference/SCRIPT_EXECUTION_REFERENCE.md for the mandatory file-first execution protocol covering complete code file writing, output capture, and file versioning rules. All regression scripts must follow the Inline Audit Trail (IAT) standard — see agent_reference/INLINE_AUDIT_TRAIL.md. For regression code, document model specification choices (why this estimator, why this clustering level, what identifying assumptions) with # INTENT:, # REASONING:, and # ASSUMES: comments.
See:
agent_reference/WORKFLOW_PHASE4_ANALYSIS.md — Stage 8 (Analysis & Visualization)agent_reference/INLINE_AUDIT_TRAIL.md — IAT documentation standardThe examples below show pyfixest syntax. In research workflows, wrap them in scripts following the file-first pattern.
import pyfixest as pf
| Function | Purpose |
|----------|---------|
| pf.feols("Y ~ X \| fe", data=df) | OLS with fixed effects |
| pf.fepois("Y ~ X \| fe", data=df) | Poisson with fixed effects |
| pf.feols("Y ~ X2 \| fe \| X1 ~ Z1", data=df) | IV / 2SLS |
| pf.did2s(data, yname, first_stage, second_stage, treatment, cluster) | Gardner (2022) DiD |
| pf.event_study(data, yname, idname, tname, gname, estimator) | Unified event study |
| pf.lpdid(data, yname, idname, tname, gname) | Local projections DiD |
| Pattern | Meaning | Example |
|---------|---------|---------|
| Y ~ X1 + X2 | No FE | "wage ~ educ + exper" |
| Y ~ X \| fe1 + fe2 | With FE | "wage ~ educ \| state + year" |
| Y ~ X \| fe \| endog ~ inst | FE + IV | "wage ~ exper \| state \| educ ~ college_prox" |
| i(factor, ref=val) | Categorical with ref | "Y ~ i(year, ref=2000) \| state" |
| sw(X1, X2) | Stepwise alternatives | "Y ~ sw(educ, exper) \| state" |
| csw0(X1, X2) | Cumulative stepwise | "Y ~ csw0(educ, exper) \| state" |
| Y1 + Y2 ~ X | Multiple outcomes | "wage + hours ~ educ \| state" |
fit = pf.feols("Y ~ X1 + X2 | fe", data=df)
fit.summary() # Print results
fit.tidy() # DataFrame of coefficients
fit.vcov("hetero") # Re-estimate with robust SEs (requires arg)
fit.vcov({"CRV1": "state"}) # Re-estimate with clustered SEs
fit.coef() # Coefficient values
fit.se() # Standard errors
fit.confint() # Confidence intervals
fit.predict() # Fitted values
fit.resid() # Residuals
fit.fixef() # Dict of FE name → numpy array (not a DataFrame)
pf.etable([fit1, fit2, fit3]) # Regression table
pf.coefplot([fit1, fit2]) # Coefficient plot
pf.iplot(fit) # Event study / interaction plot
pf.panelview(data, unit, time, treat) # Treatment pattern visualization
| Topic | Reference File |
|-------|---------------|
| Installation | ./references/quickstart.md |
| First regression | ./references/quickstart.md |
| Formula syntax | ./references/quickstart.md |
| SE comparison table | ./references/quickstart.md |
| Multi-way fixed effects | ./references/fixed-effects.md |
| Standard error types | ./references/fixed-effects.md |
| Clustered SEs | ./references/fixed-effects.md |
| HAC / Newey-West | ./references/fixed-effects.md |
| Backend options | ./references/fixed-effects.md |
| IV formula syntax | ./references/instrumental-variables.md |
| First-stage diagnostics | ./references/instrumental-variables.md |
| Weak instrument tests | ./references/instrumental-variables.md |
| TWFE | ./references/difference-in-differences.md |
| did2s | ./references/difference-in-differences.md |
| Local projections DiD | ./references/difference-in-differences.md |
| Sun-Abraham | ./references/difference-in-differences.md |
| Event study plots | ./references/difference-in-differences.md |
| Parallel trends | ./references/difference-in-differences.md |
| panelview | ./references/difference-in-differences.md |
| etable | ./references/tables-and-plots.md |
| coefplot | ./references/tables-and-plots.md |
| iplot | ./references/tables-and-plots.md |
| dtable | ./references/tables-and-plots.md |
| Wild cluster bootstrap | ./references/advanced-inference.md |
| Randomization inference | ./references/advanced-inference.md |
| Multiple testing corrections | ./references/advanced-inference.md |
| Gelbach decomposition | ./references/advanced-inference.md |
| CCV | ./references/advanced-inference.md |
| Multiple estimation | ./references/integration.md |
| Poisson regression | ./references/integration.md |
| GLM (logit/probit) | ./references/integration.md |
| Quantile regression | ./references/integration.md |
| marginaleffects | ./references/integration.md |
| Online learning | ./references/integration.md |
| Performance tuning | ./references/integration.md |
| Polars DataFrame input | ./references/gotchas.md |
| Polars-to-pandas conversion | ./references/quickstart.md |
| DiD clustering level | ./references/difference-in-differences.md |
| v0.40 breaking changes | ./references/gotchas.md |
| feglm FE limitation | ./references/gotchas.md |
| numba issues | ./references/gotchas.md |
| Formula parsing | ./references/gotchas.md |
| R fixest differences | ./references/gotchas.md |
When this library is used as a primary analytical tool, include in the report's Software & Tools references:
Berge, L., Butts, K., & McDermott, G. (2026). pyfixest: Fast high-dimensional fixed effects estimation [Computer software]. Based on fixest (R).
Cite when: pyfixest is used for regression estimation (OLS, Poisson, IV) or difference-in-differences analysis. Do not cite when: Only imported but no estimation performed.
For method-specific citations (e.g., individual DiD estimators or inference techniques),
consult the reference files in this skill and agent_reference/CITATION_REFERENCE.md.
development
Conduct rigorous thematic analysis (TA) of qualitative data following Braun and Clarke's (2006) six-phase framework. Use whenever the user mentions 'thematic analysis', 'TA', 'Braun and Clarke', 'qualitative coding', 'identifying themes', or asks for help analysing interviews, focus groups, open-ended survey responses, or transcripts to identify patterns. Also trigger for questions about inductive vs theoretical coding, semantic vs latent themes, essentialist vs constructionist epistemology, building a thematic map, or writing up a qualitative findings section. Covers all six phases, the four upfront analytic decisions, the 15-point quality checklist, and the five common pitfalls. Produces a Word document write-up and an annotated thematic map. Does NOT cover IPA, grounded theory, discourse analysis, conversation analysis, or narrative analysis — use a different method for those.
development
Guide users through writing a systematic literature review (SLR) following the PRISMA 2020 framework. Use this skill whenever the user mentions 'systematic review', 'systematic literature review', 'SLR', 'PRISMA', 'PRISMA 2020', 'PRISMA flow diagram', 'PRISMA checklist', or asks for help writing, structuring, or auditing a literature review that follows reporting guidelines. Also trigger when the user asks about inclusion/exclusion criteria for a review, search strategies for databases like Scopus/WoS/PubMed, study selection processes, risk of bias assessment, or narrative synthesis for a review paper. This skill covers the full PRISMA 2020 checklist (27 items), produces a Word document manuscript in strict journal article format, generates an annotated PRISMA flow diagram, and enforces APA 7th Edition referencing throughout. It does NOT cover meta-analysis or statistical pooling. By Chuah Kee Man.
testing
Performs placebo-in-time sensitivity analysis with hierarchical null model and optional Bayesian assurance. Use when checking model robustness, verifying lack of pre-intervention effects, or estimating study power.
data-ai
Fit, summarize, plot, and interpret a chosen CausalPy experiment. Use after the causal method has been selected, including when configuring PyMC/sklearn models and scale-aware custom priors.