pyfixest Skill

pyfixest: fast high-dimensional fixed effects estimation for Python. Covers OLS, Poisson, and IV regression with multi-way fixed effects; difference-in-differences estimators (TWFE, did2s, lpdid, Sun-Abraham); clustered standard errors; wild bootstrap; and publication output (etable regression tables, coefplot, iplot event study plots). Use when running fixed effects regressions, difference-in-differences designs, Poisson count models with FE, or producing publication-ready regression tables. For panel random/between effects, use linearmodels; for GLM/time series without FE, use statsmodels.

Comprehensive skill for fixed effects regression, instrumental variables, and difference-in-differences estimation with pyfixest. Use decision trees below to find the right guidance, then load detailed references.

What is pyfixest?

pyfixest is a Python implementation of the R fixest package (Berge, Butts, & McDermott, 2026):

Fast: Multi-way FE demeaning via alternating projections with numba/JAX/GPU backends
Concise formula syntax: Fixed effects after |, IV after second |, multiple estimation via sw()/csw()
Modern DiD: Built-in did2s, local projections DiD (lpdid), and Sun-Abraham saturated estimator
Flexible inference: Switch SE types post-estimation; wild bootstrap, randomization inference, CCV
Publication output: etable() for regression tables, coefplot() and iplot() for coefficient visualization

Version Notes

This skill targets pyfixest 0.40.0, the major release aligning with R fixest 0.13. Breaking changes from earlier versions:

Default standard errors changed from "cluster by first FE" to "iid" — old code silently produces different SEs
ssc() arguments renamed: adj → k_adj, fixef_k → k_fixef, cluster_adj → G_adj, cluster_df → G_df
fixef_rm default changed from "none" to "singleton" — singletons now dropped by default
Multicollinearity tolerance reduced from 1e-10 to 1e-09

How to Use This Skill

Reference File Structure

Each topic in ./references/ contains focused documentation:

| File | Purpose | When to Read | |------|---------|--------------| | quickstart.md | Installation, first regression, formula syntax | Starting with pyfixest | | fixed-effects.md | Multi-way FE, SE types, clustering, wild bootstrap | FE models and inference | | instrumental-variables.md | IV syntax, first stage, weak instruments | IV/2SLS estimation | | difference-in-differences.md | TWFE, did2s, lpdid, Sun-Abraham, event studies | DiD designs | | tables-and-plots.md | etable, coefplot, iplot, dtable | Reporting results | | advanced-inference.md | Wild bootstrap, randomization inference, MHT corrections, Gelbach | Advanced statistical inference | | integration.md | Multiple estimation, Poisson, GLM, marginaleffects, online learning | Advanced features | | gotchas.md | Common errors, v0.40 breaking changes, fixest vs pyfixest | Debugging issues |

Reading Order

New to pyfixest? Start with quickstart.md then fixed-effects.md
Running DiD? Read quickstart.md, then difference-in-differences.md
Need IV? Read quickstart.md, then instrumental-variables.md
Making tables? Check tables-and-plots.md
Coming from R fixest? Read quickstart.md then gotchas.md

Related Skills

| Skill | Relationship | |-------|-------------| | data-scientist | Methodology guidance — load for "why and when" behind methods | | statsmodels | Complement for non-FE models: GLM, time series, diagnostics | | linearmodels | Random effects, GMM, system estimation when pyfixest's FE-only approach is insufficient | | svy | Survey-weighted regression with complex survey designs. pyfixest's clustered SEs account for within-group correlation but do NOT handle full survey design features (stratification, unequal probability weights, FPC). If your data comes from a complex probability survey, use svy for design-based inference | | polars | Data preparation before estimation (convert to pandas before passing to pyfixest) | | plotnine | Custom visualization beyond pyfixest's built-in plots |

Quick Decision Trees

"I need to run a regression"

What kind of regression?
├─ OLS with fixed effects → ./references/quickstart.md
├─ OLS without fixed effects → ./references/quickstart.md
├─ IV / 2SLS → ./references/instrumental-variables.md
├─ Poisson (count data) → ./references/integration.md
├─ Logit / Probit → ./references/integration.md
├─ Quantile regression → ./references/integration.md
└─ Multiple models at once → ./references/integration.md

"I need difference-in-differences"

DiD design?
├─ Simple 2x2 DiD (one treatment date) → ./references/difference-in-differences.md
├─ Staggered treatment timing → ./references/difference-in-differences.md
│   ├─ did2s (Gardner imputation) → ./references/difference-in-differences.md
│   ├─ Local projections DiD → ./references/difference-in-differences.md
│   └─ Sun-Abraham saturated → ./references/difference-in-differences.md
├─ Event study plot → ./references/difference-in-differences.md
├─ Visualize treatment patterns → ./references/difference-in-differences.md
└─ Parallel trends assessment → ./references/difference-in-differences.md

"I need to choose standard errors"

What inference?
├─ Heteroskedasticity-robust (HC1) → ./references/fixed-effects.md
├─ Clustered (one-way / two-way) → ./references/fixed-effects.md
├─ Few clusters (<20) → ./references/advanced-inference.md
│   └─ Wild cluster bootstrap → ./references/advanced-inference.md
├─ HAC / Newey-West → ./references/fixed-effects.md
├─ Randomization inference → ./references/advanced-inference.md
├─ Multiple hypothesis testing → ./references/advanced-inference.md
└─ Causal cluster variance (CCV) → ./references/advanced-inference.md

"I need to present results"

Presenting results?
├─ Regression table (multiple models) → ./references/tables-and-plots.md
├─ Coefficient plot → ./references/tables-and-plots.md
├─ Event study plot → ./references/tables-and-plots.md
├─ Descriptive statistics table → ./references/tables-and-plots.md
└─ LaTeX output → ./references/tables-and-plots.md

"Something isn't working"

Having issues?
├─ Different results from old code → ./references/gotchas.md
├─ feglm with fixed effects error → ./references/gotchas.md
├─ numba installation problems → ./references/gotchas.md
├─ CRV3 memory issues → ./references/gotchas.md
├─ Poisson convergence → ./references/gotchas.md
├─ Formula parsing errors → ./references/gotchas.md
├─ R fixest vs pyfixest differences → ./references/gotchas.md
└─ Singleton warnings → ./references/gotchas.md

File-First Execution in Research Workflows

Important: In data research pipelines (see CLAUDE.md), pyfixest regressions are executed through script files, not interactively. This ensures auditability and reproducibility.

The pattern:

Write regression code to scripts/stage8_analysis/{step}_{task-name}.py
Execute via Bash with automatic output capture wrapper script
Validation results get automatically embedded in scripts as comments
If failed, create versioned copy for fixes

Closely read agent_reference/SCRIPT_EXECUTION_REFERENCE.md for the mandatory file-first execution protocol covering complete code file writing, output capture, and file versioning rules. All regression scripts must follow the Inline Audit Trail (IAT) standard — see agent_reference/INLINE_AUDIT_TRAIL.md. For regression code, document model specification choices (why this estimator, why this clustering level, what identifying assumptions) with # INTENT:, # REASONING:, and # ASSUMES: comments.

See:

agent_reference/WORKFLOW_PHASE4_ANALYSIS.md — Stage 8 (Analysis & Visualization)
agent_reference/INLINE_AUDIT_TRAIL.md — IAT documentation standard

The examples below show pyfixest syntax. In research workflows, wrap them in scripts following the file-first pattern.

Quick Reference

Essential Import

import pyfixest as pf

Core Estimation Functions

| Function | Purpose | |----------|---------| | pf.feols("Y ~ X \| fe", data=df) | OLS with fixed effects | | pf.fepois("Y ~ X \| fe", data=df) | Poisson with fixed effects | | pf.feols("Y ~ X2 \| fe \| X1 ~ Z1", data=df) | IV / 2SLS | | pf.did2s(data, yname, first_stage, second_stage, treatment, cluster) | Gardner (2022) DiD | | pf.event_study(data, yname, idname, tname, gname, estimator) | Unified event study | | pf.lpdid(data, yname, idname, tname, gname) | Local projections DiD |

Formula Syntax Quick Reference

| Pattern | Meaning | Example | |---------|---------|---------| | Y ~ X1 + X2 | No FE | "wage ~ educ + exper" | | Y ~ X \| fe1 + fe2 | With FE | "wage ~ educ \| state + year" | | Y ~ X \| fe \| endog ~ inst | FE + IV | "wage ~ exper \| state \| educ ~ college_prox" | | i(factor, ref=val) | Categorical with ref | "Y ~ i(year, ref=2000) \| state" | | sw(X1, X2) | Stepwise alternatives | "Y ~ sw(educ, exper) \| state" | | csw0(X1, X2) | Cumulative stepwise | "Y ~ csw0(educ, exper) \| state" | | Y1 + Y2 ~ X | Multiple outcomes | "wage + hours ~ educ \| state" |

Post-Estimation Essentials

fit = pf.feols("Y ~ X1 + X2 | fe", data=df)

fit.summary()                          # Print results
fit.tidy()                             # DataFrame of coefficients
fit.vcov("hetero")                     # Re-estimate with robust SEs (requires arg)
fit.vcov({"CRV1": "state"})            # Re-estimate with clustered SEs
fit.coef()                             # Coefficient values
fit.se()                               # Standard errors
fit.confint()                          # Confidence intervals
fit.predict()                          # Fitted values
fit.resid()                            # Residuals
fit.fixef()                            # Dict of FE name → numpy array (not a DataFrame)

Reporting

pf.etable([fit1, fit2, fit3])          # Regression table
pf.coefplot([fit1, fit2])              # Coefficient plot
pf.iplot(fit)                          # Event study / interaction plot
pf.panelview(data, unit, time, treat)  # Treatment pattern visualization

Topic Index

| Topic | Reference File | |-------|---------------| | Installation | ./references/quickstart.md | | First regression | ./references/quickstart.md | | Formula syntax | ./references/quickstart.md | | SE comparison table | ./references/quickstart.md | | Multi-way fixed effects | ./references/fixed-effects.md | | Standard error types | ./references/fixed-effects.md | | Clustered SEs | ./references/fixed-effects.md | | HAC / Newey-West | ./references/fixed-effects.md | | Backend options | ./references/fixed-effects.md | | IV formula syntax | ./references/instrumental-variables.md | | First-stage diagnostics | ./references/instrumental-variables.md | | Weak instrument tests | ./references/instrumental-variables.md | | TWFE | ./references/difference-in-differences.md | | did2s | ./references/difference-in-differences.md | | Local projections DiD | ./references/difference-in-differences.md | | Sun-Abraham | ./references/difference-in-differences.md | | Event study plots | ./references/difference-in-differences.md | | Parallel trends | ./references/difference-in-differences.md | | panelview | ./references/difference-in-differences.md | | etable | ./references/tables-and-plots.md | | coefplot | ./references/tables-and-plots.md | | iplot | ./references/tables-and-plots.md | | dtable | ./references/tables-and-plots.md | | Wild cluster bootstrap | ./references/advanced-inference.md | | Randomization inference | ./references/advanced-inference.md | | Multiple testing corrections | ./references/advanced-inference.md | | Gelbach decomposition | ./references/advanced-inference.md | | CCV | ./references/advanced-inference.md | | Multiple estimation | ./references/integration.md | | Poisson regression | ./references/integration.md | | GLM (logit/probit) | ./references/integration.md | | Quantile regression | ./references/integration.md | | marginaleffects | ./references/integration.md | | Online learning | ./references/integration.md | | Performance tuning | ./references/integration.md | | Polars DataFrame input | ./references/gotchas.md | | Polars-to-pandas conversion | ./references/quickstart.md | | DiD clustering level | ./references/difference-in-differences.md | | v0.40 breaking changes | ./references/gotchas.md | | feglm FE limitation | ./references/gotchas.md | | numba issues | ./references/gotchas.md | | Formula parsing | ./references/gotchas.md | | R fixest differences | ./references/gotchas.md |

Citation

When this library is used as a primary analytical tool, include in the report's Software & Tools references:

Berge, L., Butts, K., & McDermott, G. (2026). pyfixest: Fast high-dimensional fixed effects estimation [Computer software]. Based on fixest (R).

Cite when: pyfixest is used for regression estimation (OLS, Poisson, IV) or difference-in-differences analysis. Do not cite when: Only imported but no estimation performed.

For method-specific citations (e.g., individual DiD estimators or inference techniques), consult the reference files in this skill and agent_reference/CITATION_REFERENCE.md.

pyfixest Skill

What is pyfixest?

pyfixest is a Python implementation of the R fixest package (Berge, Butts, & McDermott, 2026):

Fast: Multi-way FE demeaning via alternating projections with numba/JAX/GPU backends
Concise formula syntax: Fixed effects after |, IV after second |, multiple estimation via sw()/csw()
Modern DiD: Built-in did2s, local projections DiD (lpdid), and Sun-Abraham saturated estimator
Flexible inference: Switch SE types post-estimation; wild bootstrap, randomization inference, CCV
Publication output: etable() for regression tables, coefplot() and iplot() for coefficient visualization

Version Notes

This skill targets pyfixest 0.40.0, the major release aligning with R fixest 0.13. Breaking changes from earlier versions:

Default standard errors changed from "cluster by first FE" to "iid" — old code silently produces different SEs
ssc() arguments renamed: adj → k_adj, fixef_k → k_fixef, cluster_adj → G_adj, cluster_df → G_df
fixef_rm default changed from "none" to "singleton" — singletons now dropped by default
Multicollinearity tolerance reduced from 1e-10 to 1e-09

How to Use This Skill

Reference File Structure

Each topic in ./references/ contains focused documentation:

Reading Order

New to pyfixest? Start with quickstart.md then fixed-effects.md
Running DiD? Read quickstart.md, then difference-in-differences.md
Need IV? Read quickstart.md, then instrumental-variables.md
Making tables? Check tables-and-plots.md
Coming from R fixest? Read quickstart.md then gotchas.md

Related Skills

Quick Decision Trees

"I need to run a regression"

What kind of regression?
├─ OLS with fixed effects → ./references/quickstart.md
├─ OLS without fixed effects → ./references/quickstart.md
├─ IV / 2SLS → ./references/instrumental-variables.md
├─ Poisson (count data) → ./references/integration.md
├─ Logit / Probit → ./references/integration.md
├─ Quantile regression → ./references/integration.md
└─ Multiple models at once → ./references/integration.md

"I need difference-in-differences"

DiD design?
├─ Simple 2x2 DiD (one treatment date) → ./references/difference-in-differences.md
├─ Staggered treatment timing → ./references/difference-in-differences.md
│   ├─ did2s (Gardner imputation) → ./references/difference-in-differences.md
│   ├─ Local projections DiD → ./references/difference-in-differences.md
│   └─ Sun-Abraham saturated → ./references/difference-in-differences.md
├─ Event study plot → ./references/difference-in-differences.md
├─ Visualize treatment patterns → ./references/difference-in-differences.md
└─ Parallel trends assessment → ./references/difference-in-differences.md

"I need to choose standard errors"

What inference?
├─ Heteroskedasticity-robust (HC1) → ./references/fixed-effects.md
├─ Clustered (one-way / two-way) → ./references/fixed-effects.md
├─ Few clusters (<20) → ./references/advanced-inference.md
│   └─ Wild cluster bootstrap → ./references/advanced-inference.md
├─ HAC / Newey-West → ./references/fixed-effects.md
├─ Randomization inference → ./references/advanced-inference.md
├─ Multiple hypothesis testing → ./references/advanced-inference.md
└─ Causal cluster variance (CCV) → ./references/advanced-inference.md

"I need to present results"

Presenting results?
├─ Regression table (multiple models) → ./references/tables-and-plots.md
├─ Coefficient plot → ./references/tables-and-plots.md
├─ Event study plot → ./references/tables-and-plots.md
├─ Descriptive statistics table → ./references/tables-and-plots.md
└─ LaTeX output → ./references/tables-and-plots.md

"Something isn't working"

Having issues?
├─ Different results from old code → ./references/gotchas.md
├─ feglm with fixed effects error → ./references/gotchas.md
├─ numba installation problems → ./references/gotchas.md
├─ CRV3 memory issues → ./references/gotchas.md
├─ Poisson convergence → ./references/gotchas.md
├─ Formula parsing errors → ./references/gotchas.md
├─ R fixest vs pyfixest differences → ./references/gotchas.md
└─ Singleton warnings → ./references/gotchas.md

File-First Execution in Research Workflows

Important: In data research pipelines (see CLAUDE.md), pyfixest regressions are executed through script files, not interactively. This ensures auditability and reproducibility.

The pattern:

Write regression code to scripts/stage8_analysis/{step}_{task-name}.py
Execute via Bash with automatic output capture wrapper script
Validation results get automatically embedded in scripts as comments
If failed, create versioned copy for fixes

See:

agent_reference/WORKFLOW_PHASE4_ANALYSIS.md — Stage 8 (Analysis & Visualization)
agent_reference/INLINE_AUDIT_TRAIL.md — IAT documentation standard

The examples below show pyfixest syntax. In research workflows, wrap them in scripts following the file-first pattern.

Quick Reference

Essential Import

import pyfixest as pf

Core Estimation Functions

Formula Syntax Quick Reference

Post-Estimation Essentials

fit = pf.feols("Y ~ X1 + X2 | fe", data=df)

fit.summary()                          # Print results
fit.tidy()                             # DataFrame of coefficients
fit.vcov("hetero")                     # Re-estimate with robust SEs (requires arg)
fit.vcov({"CRV1": "state"})            # Re-estimate with clustered SEs
fit.coef()                             # Coefficient values
fit.se()                               # Standard errors
fit.confint()                          # Confidence intervals
fit.predict()                          # Fitted values
fit.resid()                            # Residuals
fit.fixef()                            # Dict of FE name → numpy array (not a DataFrame)

Reporting

pf.etable([fit1, fit2, fit3])          # Regression table
pf.coefplot([fit1, fit2])              # Coefficient plot
pf.iplot(fit)                          # Event study / interaction plot
pf.panelview(data, unit, time, treat)  # Treatment pattern visualization

Topic Index

Citation

When this library is used as a primary analytical tool, include in the report's Software & Tools references:

Berge, L., Butts, K., & McDermott, G. (2026). pyfixest: Fast high-dimensional fixed effects estimation [Computer software]. Based on fixest (R).

Cite when: pyfixest is used for regression estimation (OLS, Poisson, IV) or difference-in-differences analysis. Do not cite when: Only imported but no estimation performed.

For method-specific citations (e.g., individual DiD estimators or inference techniques), consult the reference files in this skill and agent_reference/CITATION_REFERENCE.md.

Adoption

brycewang-stanford/pyfixest

$ install --global

Security Scan Results

SKILL.md

pyfixest Skill

What is pyfixest?

Version Notes

How to Use This Skill

Reference File Structure

Reading Order

Related Skills

Quick Decision Trees

"I need to run a regression"

"I need difference-in-differences"

"I need to choose standard errors"

"I need to present results"

"Something isn't working"

File-First Execution in Research Workflows

Quick Reference

Essential Import

Core Estimation Functions

Formula Syntax Quick Reference

Post-Estimation Essentials

Reporting

Topic Index

Citation

Related Skills

brycewang-stanford/literature-review-tools

brycewang-stanford/auto-empirical-research-skills

brycewang-stanford/aer-preregistration

brycewang-stanford/economist-data-skill

brycewang-stanford/pyfixest

$ install --global

Security Scan Results

SKILL.md

pyfixest Skill

What is pyfixest?

Version Notes

How to Use This Skill

Reference File Structure

Reading Order

Related Skills

Quick Decision Trees

"I need to run a regression"

"I need difference-in-differences"

"I need to choose standard errors"

"I need to present results"

"Something isn't working"

File-First Execution in Research Workflows

Quick Reference

Essential Import

Core Estimation Functions

Formula Syntax Quick Reference

Post-Estimation Essentials

Reporting

Topic Index

Citation

Related Skills

brycewang-stanford/literature-review-tools

brycewang-stanford/auto-empirical-research-skills

brycewang-stanford/aer-preregistration

brycewang-stanford/economist-data-skill