skills/17-DAAF-Contribution-Community-daaf/dot-claude/skills/linearmodels/SKILL.md
Panel data, IV/GMM, system regression. PanelOLS (FE/RE), BetweenOLS, Fama-MacBeth, IV2SLS/LIML/GMM, SUR, 3SLS, Driscoll-Kraay SEs. Use for RE/between, system estimation, or GMM. Complements pyfixest (FE + DiD) and statsmodels (GLM + time series).
npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research linearmodelsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
linearmodels: panel data, IV/GMM, system regression, and asset pricing models in Python. Covers PanelOLS (FE/RE), BetweenOLS, FirstDifferenceOLS, Fama-MacBeth, IV2SLS/LIML/GMM, SUR, IV3SLS, and Driscoll-Kraay SEs. Use for random effects estimation, between or first-difference panel models, system estimation (SUR, 3SLS), LIML/GMM instrumental variables, Fama-MacBeth regressions, or Driscoll-Kraay standard errors. Complements pyfixest (high-dimensional FE + DiD) and statsmodels (GLM + time series).
Comprehensive skill for panel data estimation, instrumental variables, system regression, and asset pricing with linearmodels (Kevin Sheppard). Use decision trees below to find the right guidance, then load detailed references.
linearmodels extends statsmodels with specialized model classes for structured data:
| File | Purpose | When to Read |
|------|---------|--------------|
| quickstart.md | Installation, MultiIndex setup, formula vs array API, first model | Starting with linearmodels |
| panel-models.md | PanelOLS, RandomEffects, BetweenOLS, FD, Pooled, FamaMacBeth | Panel data estimation |
| iv-models.md | IV2SLS, IVLIML, IVGMM, IVGMMCUE, AbsorbingLS | IV / GMM estimation |
| system-models.md | SUR, IV3SLS, IVSystemGMM, cross-equation constraints | System estimation |
| asset-pricing.md | LinearFactorModel, TradedFactorModel, GMM estimation | Asset pricing tests |
| covariance-inference.md | All SE types, Driscoll-Kraay, clustering, GMM weights | Choosing standard errors |
| gotchas.md | MultiIndex requirement, pyfixest/statsmodels boundary, limits | Debugging issues |
quickstart.md then panel-models.mdquickstart.md then iv-models.mdquickstart.md then system-models.mdquickstart.md then asset-pricing.mdcovariance-inference.mdquickstart.md then gotchas.md| Skill | Relationship |
|-------|-------------|
| pyfixest | Preferred for high-dimensional FE, FE + IV, DiD, fast demeaning, publication tables. Use linearmodels when pyfixest cannot do what you need (RE, system models, LIML/GMM, Fama-MacBeth) |
| statsmodels | Foundation library. Use statsmodels for GLM, time series, diagnostics. linearmodels extends statsmodels for panel/IV/system models |
| svy | Survey-weighted regression with complex survey designs. linearmodels supports weights for population/precision weighting in panel models, but this is NOT equivalent to design-based survey inference — it does not handle stratification, clustering as a design feature, or replicate weights. If your data comes from a complex probability survey, use svy |
| data-scientist | Methodology guidance — load for "why and when" behind model choices |
| polars | Data preparation before estimation; convert to pandas with .to_pandas() before passing to linearmodels |
What panel estimation method?
├─ Fixed effects (within estimator)
│ ├─ 1-2 way FE, no IV → linearmodels PanelOLS or pyfixest feols
│ ├─ 3+ way FE → pyfixest (linearmodels max 2-way in PanelOLS)
│ ├─ FE + IV combined → pyfixest (linearmodels has no Panel IV)
│ └─ FE + DiD → pyfixest (linearmodels has no DiD)
├─ Random effects (GLS) → linearmodels RandomEffects
│ └─ → ./references/panel-models.md
├─ FE vs RE comparison → linearmodels (run both, compare)
│ └─ → ./references/panel-models.md
├─ Between estimator → linearmodels BetweenOLS
│ └─ → ./references/panel-models.md
├─ First difference → linearmodels FirstDifferenceOLS
│ └─ → ./references/panel-models.md
├─ Pooled OLS (panel-aware SEs) → linearmodels PooledOLS
│ └─ → ./references/panel-models.md
└─ Fama-MacBeth → linearmodels FamaMacBeth
└─ → ./references/panel-models.md
What IV method?
├─ 2SLS (standard IV)
│ ├─ With fixed effects → pyfixest (linearmodels has no Panel IV)
│ └─ Without FE → linearmodels IV2SLS or pyfixest
│ └─ → ./references/iv-models.md
├─ LIML / k-class (better finite-sample) → linearmodels IVLIML
│ └─ → ./references/iv-models.md
├─ GMM-IV (efficient, overidentified) → linearmodels IVGMM
│ └─ → ./references/iv-models.md
├─ Continuously updating GMM → linearmodels IVGMMCUE
│ └─ → ./references/iv-models.md
└─ High-dimensional absorbed FE (OLS) → linearmodels AbsorbingLS
└─ → ./references/iv-models.md
System of equations?
├─ Multiple equations, correlated errors → SUR
│ └─ → ./references/system-models.md
├─ Multiple equations + endogenous variables → IV3SLS
│ └─ → ./references/system-models.md
├─ System GMM → IVSystemGMM
│ └─ → ./references/system-models.md
├─ Cross-equation parameter restrictions → LinearConstraint
│ └─ → ./references/system-models.md
└─ Not sure which → Start with SUR
└─ → ./references/system-models.md
Having issues?
├─ TypeError about DataFrame index → ./references/gotchas.md
├─ Need FE + IV in one model → ./references/gotchas.md
├─ Need 3+ way fixed effects → ./references/gotchas.md
├─ Constant term confusion → ./references/gotchas.md
├─ Formula parsing errors → ./references/gotchas.md
├─ Want to compare with pyfixest → ./references/gotchas.md
└─ SUR performance issues → ./references/gotchas.md
Important: In data research pipelines (see CLAUDE.md), linearmodels estimation is executed through script files, not interactively. This ensures auditability and reproducibility.
The pattern:
scripts/stage8_analysis/{step}_{task-name}.pyClosely read agent_reference/SCRIPT_EXECUTION_REFERENCE.md for the mandatory file-first execution protocol covering complete code file writing, output capture, and file versioning rules.
The examples below show linearmodels syntax. In research workflows, wrap them in scripts following the file-first pattern.
from linearmodels.panel import PanelOLS, RandomEffects, BetweenOLS
from linearmodels.panel import FirstDifferenceOLS, PooledOLS, FamaMacBeth
from linearmodels.iv import IV2SLS, IVLIML, IVGMM, IVGMMCUE, AbsorbingLS
from linearmodels.system import SUR, IV3SLS, IVSystemGMM
from linearmodels.panel import compare # Panel model comparison tables
import pandas as pd
# Panel data MUST have a MultiIndex with (entity, time)
df = df.set_index(["entity_id", "year"])
# Verify the index
print(f"Index names: {df.index.names}")
print(f"Index levels: {df.index.nlevels}")
| Operation | Code |
|-----------|------|
| Panel FE (formula) | PanelOLS.from_formula("y ~ x1 + x2 + EntityEffects", data=df).fit() |
| Panel FE (array) | PanelOLS(df.y, df[["x1","x2"]], entity_effects=True).fit() |
| Two-way FE | PanelOLS.from_formula("y ~ x1 + EntityEffects + TimeEffects", data=df).fit() |
| Random effects | RandomEffects.from_formula("y ~ 1 + x1 + x2", data=df).fit() |
| Between OLS | BetweenOLS.from_formula("y ~ 1 + x1 + x2", data=df).fit() |
| First difference | FirstDifferenceOLS.from_formula("y ~ x1 + x2", data=df).fit() |
| Fama-MacBeth | FamaMacBeth.from_formula("y ~ 1 + x1 + x2", data=df).fit() |
| IV / 2SLS | IV2SLS.from_formula("y ~ 1 + exog + [endog ~ inst]", data=df).fit() |
| LIML | IVLIML.from_formula("y ~ 1 + exog + [endog ~ inst]", data=df).fit() |
| Clustered SEs | mod.fit(cov_type="clustered", cluster_entity=True) |
| Driscoll-Kraay | mod.fit(cov_type="kernel", kernel="bartlett", bandwidth=5) |
| Summary | results.summary |
| Model comparison | compare({"FE": fe_res, "RE": re_res}) |
# Panel FE keywords (appear in formula, not after |)
"y ~ x1 + x2 + EntityEffects" # Entity FE
"y ~ x1 + x2 + EntityEffects + TimeEffects" # Two-way FE
"y ~ x1 + x2 + TimeEffects" # Time FE only
# IV bracket notation
"y ~ 1 + exog + [endog ~ instrument1 + instrument2]"
# Suppress intercept
"y ~ x1 + x2 - 1"
| Topic | Reference File |
|-------|---------------|
| Installation | ./references/quickstart.md |
| MultiIndex data setup | ./references/quickstart.md |
| Formula vs array API | ./references/quickstart.md |
| First model | ./references/quickstart.md |
| Syntax comparison (pyfixest, statsmodels) | ./references/quickstart.md |
| PanelOLS (entity/time effects) | ./references/panel-models.md |
| RandomEffects | ./references/panel-models.md |
| BetweenOLS | ./references/panel-models.md |
| FirstDifferenceOLS | ./references/panel-models.md |
| PooledOLS | ./references/panel-models.md |
| FamaMacBeth | ./references/panel-models.md |
| FE vs RE decision | ./references/panel-models.md |
| Variance decomposition | ./references/panel-models.md |
| Weighted panel estimation | ./references/panel-models.md |
| R-squared types (within, between, overall) | ./references/panel-models.md |
| IV2SLS | ./references/iv-models.md |
| IVLIML and k-class estimators | ./references/iv-models.md |
| IVGMM (1-step, 2-step, iterative) | ./references/iv-models.md |
| IVGMMCUE | ./references/iv-models.md |
| AbsorbingLS (high-dim FE OLS) | ./references/iv-models.md |
| First-stage diagnostics | ./references/iv-models.md |
| Overidentification tests | ./references/iv-models.md |
| SUR (Seemingly Unrelated Regression) | ./references/system-models.md |
| IV3SLS | ./references/system-models.md |
| IVSystemGMM | ./references/system-models.md |
| Cross-equation constraints | ./references/system-models.md |
| LinearFactorModel | ./references/asset-pricing.md |
| TradedFactorModel | ./references/asset-pricing.md |
| Factor model GMM | ./references/asset-pricing.md |
| Driscoll-Kraay SEs | ./references/covariance-inference.md |
| Clustered SEs (entity, time, both) | ./references/covariance-inference.md |
| HAC / kernel covariance | ./references/covariance-inference.md |
| GMM weight matrices | ./references/covariance-inference.md |
| Debiased inference | ./references/covariance-inference.md |
| MultiIndex requirement | ./references/gotchas.md |
| Maximum 2-way FE limit | ./references/gotchas.md |
| No Panel IV | ./references/gotchas.md |
| pyfixest vs linearmodels boundary | ./references/gotchas.md |
| statsmodels vs linearmodels boundary | ./references/gotchas.md |
| Constant term handling | ./references/gotchas.md |
When this library is used as a primary analytical tool, include in the report's Software & Tools references:
Sheppard, K. linearmodels: Econometric models for panel data, IV/GMM, and system regression [Computer software]. https://bashtage.github.io/linearmodels/
Cite when: linearmodels is used for panel estimation (RE, between), IV/GMM, Fama-MacBeth, or system regression (SUR, 3SLS). Do not cite when: Only imported but no estimation performed.
For method-specific citations (e.g., individual estimators or techniques),
consult the reference files in this skill and agent_reference/CITATION_REFERENCE.md.
development
Conduct rigorous thematic analysis (TA) of qualitative data following Braun and Clarke's (2006) six-phase framework. Use whenever the user mentions 'thematic analysis', 'TA', 'Braun and Clarke', 'qualitative coding', 'identifying themes', or asks for help analysing interviews, focus groups, open-ended survey responses, or transcripts to identify patterns. Also trigger for questions about inductive vs theoretical coding, semantic vs latent themes, essentialist vs constructionist epistemology, building a thematic map, or writing up a qualitative findings section. Covers all six phases, the four upfront analytic decisions, the 15-point quality checklist, and the five common pitfalls. Produces a Word document write-up and an annotated thematic map. Does NOT cover IPA, grounded theory, discourse analysis, conversation analysis, or narrative analysis — use a different method for those.
development
Guide users through writing a systematic literature review (SLR) following the PRISMA 2020 framework. Use this skill whenever the user mentions 'systematic review', 'systematic literature review', 'SLR', 'PRISMA', 'PRISMA 2020', 'PRISMA flow diagram', 'PRISMA checklist', or asks for help writing, structuring, or auditing a literature review that follows reporting guidelines. Also trigger when the user asks about inclusion/exclusion criteria for a review, search strategies for databases like Scopus/WoS/PubMed, study selection processes, risk of bias assessment, or narrative synthesis for a review paper. This skill covers the full PRISMA 2020 checklist (27 items), produces a Word document manuscript in strict journal article format, generates an annotated PRISMA flow diagram, and enforces APA 7th Edition referencing throughout. It does NOT cover meta-analysis or statistical pooling. By Chuah Kee Man.
testing
Performs placebo-in-time sensitivity analysis with hierarchical null model and optional Bayesian assurance. Use when checking model robustness, verifying lack of pre-intervention effects, or estimating study power.
data-ai
Fit, summarize, plot, and interpret a chosen CausalPy experiment. Use after the causal method has been selected, including when configuring PyMC/sklearn models and scale-aware custom priors.