skills/00.1-Full-empirical-analysis-skill_Python/SKILL.md
Classical end-to-end empirical analysis workflow in the traditional Python econometric stack — pandas + numpy + scipy + statsmodels + linearmodels + pyfixest + rdrobust + econml + causalml + matplotlib/seaborn. **Defaults to economics empirical-paper style** (AER / QJE / AEJ) — every run produces a publication-ready output set with a multi-column regression table (M1→M6 progressive controls/FE) as the centerpiece, plus Table 1 (descriptives), mechanism / heterogeneity / robustness tables, and event-study + coefficient + trend figures. Covers the full 8-step pipeline an applied economist or quantitative social scientist runs on every paper — (1) data cleaning, (2) variable construction & transformation, (3) descriptive statistics & Table 1, (4) statistical diagnostic tests, (5) baseline empirical modeling, (6) robustness battery, (7) further analysis (mechanism, heterogeneity, mediation, moderation), (8) publication-ready tables & figures. **Also covers two parallel domain modes that share the same 8-step scaffolding** — **Mode A — Epidemiology / public health** (target-trial emulation via `zepid` / hand-rolled `pandas`, IPTW + g-formula + TMLE doubly-robust triplet via `zepid` / `econml` / `lifelines`, Mendelian randomization via `pymr` / `mrtool` (or `rpy2` → `MendelianRandomization`/`TwoSampleMR`), KM / AFT / Cox survival via `lifelines`, E-value sensitivity, principal stratification — STROBE / TRIPOD reporting), and **Mode B — ML causal inference** (DML via `econml.dml` / `doubleml`, S/T/X/R/DR meta-learners via `econml.metalearners` / `causalml`, causal forest via `econml.grf` / `causalml`, Dragonnet / TARNet / CEVAE neural causal via `causalml`, BCF via `pymc-bart` / `bcf-py`, matrix completion, CATE distribution + policy tree via `econml.policy` / `policytree-py`, off-policy evaluation, conformal causal via `mapie`, fairness audit via `fairlearn`, DAG learning via `causal-learn` / `cdt` / LLM-assisted). Prescribes which library to reach for at each step, shows the canonical code, and links to deeper `references/` files for variant-specific patterns. Use when the user asks for a **complete empirical analysis** in Python, wants to replicate an applied-economics paper from scratch, needs a reproducible workflow that is NOT opinionated on any single vertical package (contrast with StatsPAI), wants explicit control over every estimator and diagnostic, or asks "how do I write a full empirical pipeline in Python?". Also triggers when the user names a specific classical step in isolation — "winsorize at 1/99%", "run Breusch-Pagan", "build a Table 1 balance table", "do a placebo test", "event study plot", "mediation analysis" — and wants it wired into the broader pipeline. Mode A triggers on "target trial emulation", "IPTW", "TMLE", "Mendelian randomization", "STROBE", "公共健康", "流行病学". Mode B triggers on "DML", "double machine learning", "causal forest", "meta-learner", "Dragonnet", "BCF", "policy tree", "conformal causal", "fairness audit", "因果机器学习".
npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research Full-empirical-analysis-skillInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill is the canonical 8-step pipeline an applied economist runs on every empirical paper, written in the traditional Python ecosystem — no opinionated one-stop wrapper. Every step calls libraries directly (pandas, numpy, scipy, statsmodels, linearmodels, pyfixest, rdrobust, econml, causalml, matplotlib, seaborn), so the agent — or the user reading the agent's code — has full visibility and can swap any component.
Companion skill: if the user prefers a single-import agent-native DSL (import statspai as sp), route to 00-StatsPAI_skill instead. This skill is the opposite philosophy: everything explicit, everything inspectable, every diagnostic run by hand, every plot shaped by the user.
references/ holds variant-specific depth (dozens of tests, estimator-specific diagnostics, plot recipes).pip install -r requirements.txt and df = pd.read_csv(...).The default playbook above is AER-style applied econometrics — the AEA convention: written-out estimating equation, identifying assumption, design horse-race, full robustness gauntlet. The skill also ships two parallel sub-pipelines for the other two big causal-inference traditions, each reusing the same Steps 1–4 (cleaning / construction / descriptives / diagnostics) and Step 8 (tables & figures) — only Step 5 (estimator) and Step 6/7 (robustness / mechanism) swap libraries:
| Mode | Reader convention | Step-5 estimator stack | Reporting stack | Jump to |
|---|---|---|---|---|
| Default — Applied Econ (AER / QJE / AEJ) | "Show the equation + identifying assumption + design horse-race; controls visible; clustered SE" | DID / IV / RD / SCM / matching / pyfixest.feols HDFE | AER house-style multi-column pf.etable / Stargazer + 8-section paper layout | Steps 1 → 8 (entire playbook below) |
| Mode A — Epidemiology / Public Health | "STROBE / TRIPOD-AI; target trial protocol; doubly-robust estimand; absolute & relative risk; KM survival" | Target-trial emulation · IPTW (zepid) · g-formula (zepid / hand-rolled) · TMLE (zepid.causal.gformula / econml) · Mendelian randomization (pymr / rpy2+TwoSampleMR) · KM/AFT (lifelines) | Same pf.etable + risk-difference / hazard-ratio / E-value rows | §A. Epidemiology pipeline |
| Mode B — ML Causal Inference | "DML / meta-learners / causal forest / DR-learner; CATE distribution; policy value" | DML (econml.dml / doubleml) · S/T/X/R/DR-Learner (econml.metalearners / causalml) · GRF causal forest (econml.grf) · Dragonnet/TARNet/CEVAE (causalml) · BCF (pymc-bart/bcf-py) · matrix completion | pf.etable ML horse-race + CATE plot + policy-value table + conformal PI (mapie) | §B. ML causal pipeline |
How to invoke a non-default mode (Claude / agent picks this up from the user's wording):
| User says... | Mode the skill switches to |
|---|---|
| "Run a DID / IV / RD / event study", "AER table", "applied micro" | Default (AER econ) — Steps 1 → 8 |
| "Target trial emulation", "g-formula", "IPTW", "TMLE", "Mendelian randomization", "STROBE / TRIPOD", "公共健康 / 流行病学", "epi pipeline", "RWE study", "cohort study", "case-control" | Mode A (Epi) — §A |
| "DML", "double machine learning", "causal forest", "meta-learner", "CATE", "Dragonnet", "BCF", "policy learning", "conformal causal", "fairness audit", "ML causal", "uplift modeling", "因果机器学习" | Mode B (ML causal) — §B |
| "Mix" (e.g. "estimate DID + then ML CATE on the heterogeneity") | Default + Mode B in sequence — every estimator returns a coefficient + SE pair, drop them all into one pf.etable(...) for the horse-race column |
The three modes share the same Step 1–4 cleaning / Table 1 / diagnostics scaffolding, the same Step 8 export stack, and the same DAG-first identification logic — switching modes only changes which Step-5 estimator family you reach for, not the surrounding paper structure. If you only want descriptive stats / Table 1 / a balance check, the AER tableone / gtsummary-style calls in Step 3 work identically across all three modes.
This skill defaults to the applied-economics paper convention. Unless the user explicitly asks for a single point estimate, every run produces the full publication-ready output set below. Treat it as the contract of Step 8 — mandatory, not opt-in.
| # | Table | Source / library | Saves to |
|---|---|---|---|
| T1 | Summary statistics & balance (treated vs control, with SMD / p-values) | pandas.describe + custom table1() (Step 3) | tables/table1_balance.xlsx + .docx + .tex |
| T2 ★ | Main results — multi-column regression M1→M6 (progressive controls + FE) | pyfixest.feols × 6 specs → pf.etable() / Stargazer (Step 5–6) | tables/table2_main.xlsx + .docx + .tex |
| T3 | Mechanism / outcome ladder — same treatment, 3+ outcomes side-by-side | feols looped over y ∈ {Y1, Y2, Y3, Y_main} → pf.etable | tables/table3_mechanism.xlsx + .docx + .tex |
| T4 | Heterogeneity — subgroup × main coef (gender, age, region, …) | subgroup feols × Wald → pf.etable (Step 7) | tables/table4_heterogeneity.xlsx + .docx + .tex |
| T5 | Robustness battery — alt SE / alt cluster / alt sample / placebo, in one table | feols × variants → pf.etable (Step 6) | tables/table5_robustness.xlsx + .docx + .tex |
★ Table 2 is the centerpiece of every economics paper. It is the multi-column regression table that walks the reader from raw correlation (M1) to the fully-specified design (M6: 2-way FE + interacted FE + cluster-robust SE). Do not collapse it into a single column. Do not report only the headline coefficient. The progression is the credibility argument: if M1→M6 is monotone and stable, the design is plausibly identifying; if it collapses on adding FE, that is the result.
Canonical 6 columns, in order:
- M1 raw bivariate (
y ~ treat)- M2 + demographics (
+ age + edu)- M3 + sector controls (
+ tenure / firm_size / industry)- M4 + unit FE (
| worker_id)- M5 + 2-way FE (
| worker_id + year)- M6 + interacted FE (
| worker_id + year + industry^year) with cluster-robust SE
| # | Figure | Source / library | Saves to |
|---|---|---|---|
| F1 | Trend / motivation — treated vs control over time, with policy line | df.groupby([year, treat])[y].mean().unstack().plot() (Step 3) | figures/fig1_trend.png (300 dpi, 必须导出 PNG) + .pdf |
| F2 | Event-study coefficients with 95% CI, base period at –1 | pf.feols(... ~ i(rel_time, ref=-1) ...) → pf.iplot() (Step 5) | figures/fig2_event_study.png (300 dpi, 必须导出 PNG) + .pdf |
| F3 | Coefficient plot across specs M1→M6 | matplotlib.errorbar over the 6 fitted models (Step 8) | figures/fig3_coefplot.png (300 dpi, 必须导出 PNG) + .pdf |
| F4 | Robustness / sensitivity curve — spec curve, HonestDiD, or cluster-comparison | spec_curve loop or honest_did plot (Step 6) | figures/fig4_sensitivity.png (300 dpi, 必须导出 PNG) + .pdf |
project/
├── tables/ table1_balance.xlsx/.docx/.tex table2_main.xlsx/.docx/.tex
│ table3_mechanism.xlsx/.docx/.tex table4_heterogeneity.xlsx/.docx/.tex
│ table5_robustness.xlsx/.docx/.tex
└── figures/ fig1_trend.png(300dpi)+.pdf fig2_event_study.png(300dpi)+.pdf
fig3_coefplot.png(300dpi)+.pdf fig4_sensitivity.png(300dpi)+.pdf
关键输出规则(必须遵守):
pip install pandas numpy scipy matplotlib seaborn \
statsmodels linearmodels pyfixest \
rdrobust rddensity \
econml causalml \
stargazer # publication-ready regression tables
# Optional but commonly needed:
pip install missingno # missing-data visualization
pip install pyreadstat # Stata .dta / SPSS .sav import
pip install arch # GARCH, unit-root tests, HAC
pip install pingouin # clean stats tests wrapper
pip install pysynth # synthetic control (N=1 treated)
┌──────────────────────────────────────────────────────────────────────┐
│ Step −1 Pre-Analysis Plan (PAP) statsmodels.stats.power / mde │
│ Step 0 Sample log + data contract sample_log/asserts/JSON dump │
│ Step 1 Data cleaning missing / outliers / dtype / join │
│ Step 2 Variable construction log / winsorize / std / encode │
│ Step 2.5 Empirical strategy equation × ID assumption + pre-reg│
│ Step 3 Descriptive statistics Table 1 / corr / distribution │
│ Step 3.5 Identification graphics event-study/1st-stage/McCrary/love│
│ Step 4 Diagnostic tests normality / hetero / autocorr / VIF│
│ Step 5 Baseline modeling OLS / panel / IV / DID / RD / SC │
│ Step 6 Robustness battery placebo / subsample / spec curve │
│ Step 7 Further analysis mechanism / heterogeneity / mediation│
│ Step 8 Tables & figures stargazer / coefplot / event study│
└──────────────────────────────────────────────────────────────────────┘
The 8 steps mirror the canonical sections of an applied AER / QJE / AEJ paper. Each step is one paper section and emits a paper-ready artifact on disk:
Paper section Step Python moves
─────────────────────────── ───── ────────────────────────────────────────────────
Pre-Analysis Plan −1 statsmodels.stats.power + freeze pap.json
§1. Data 0 sample_log + 5-check data contract → JSON
§1. Data 1 pandas read_*/dropna/dtype/merge(validate=)
§1. Data 2 np.log/np.arcsinh/winsorize/groupby.shift/diff
§1.1 Descriptives (Table 1) 3 df.describe() · table1_with_smd · seaborn
§2. Empirical Strategy 2.5 equation × ID assumption table + pre-reg
§3. Identification graphics 3.5 pf.iplot · binscatter · rdplot · mccrary · love
§3.5 Diagnostics 4 statsmodels.diagnostic + scipy.stats
§4. Main Results (Table 2) 5 pf.feols/IV2SLS/CausalForest · pf.etable / Stargazer
§5. Heterogeneity (Table 3) 7 pf.feols(... + i(.):X) · marginaleffects-py
§6. Mechanisms / Channels 7 Baron-Kenny · econml.dml · outcome ladder
§7. Robustness gauntlet 6 placebo · oster · honestdid · spec_curve · 2-way
§8. Replication package 8 Stargazer.render_latex · pf.etable("docx") · result.json
Below is the canonical code at each step. All examples share one running narrative — a labor-economics panel where training (treatment) affects log_wage (outcome), with covariates age, edu, tenure, panel keys worker_id / firm_id / year. Column names and parameter values are illustrative — substitute the real ones from the user's DataFrame. Only library names and call shapes are normative.
When a step has many variants (e.g. staggered DID has five different estimators; heteroskedasticity has four classic tests), SKILL.md shows the one you reach for first and links to
references/NN-<topic>.mdfor the rest. Read the reference file when the user's case doesn't fit the default.
A modern AER paper has 5–7 figures and 3–5 main tables + an appendix robustness table. Every step below leaves at least one numbered artifact on disk. Default file names assume parallel .tex / .docx / .xlsx exports (the agent should produce all three so co-authors can edit in Word, the build system can use LaTeX, and editors can edit raw numbers in Excel). 所有图片必须同时保存 PNG(≥300 dpi)和 PDF 两种格式。
| § | Artifact | Python primitive | Filenames |
|---|---|---|---|
| §1 | Figure 1: raw trends / treatment rollout | df.groupby([time,treat])[y].mean().unstack().plot() · seaborn.heatmap for staggered rollout | figures/fig1_trend.png(300dpi)+.pdf |
| §1 | Table 1: summary stats (full / treated / control + Δ + SMD) | table1(df, by=, cols=) (Step 3.b) → write LaTeX/Word/Excel | tables/table1_balance.xlsx/.docx/.tex |
| §3 | Figure 2: identification graphic (event-study / first-stage / McCrary / RD scatter / SCM trajectory) | pf.iplot(es) · binscatter · rdplot · rddensity · synthdid | figures/fig2_event_study.png(300dpi)+.pdf |
| §4 | Table 2: main results — progressive controls M1→M6 | pf.etable([m1...m6]) · Stargazer([m1.fit ... m6.fit]) | tables/table2_main.xlsx/.docx/.tex |
| §4 | Table 2-bis: design horse-race (OLS / IV / DID / DML) | pf.etable([ols, iv, did, dml]) | tables/table2b_designs.xlsx/.docx/.tex |
| §4 | Figure 3: coefficient plot across specs | pf.coefplot([m1...m6], coefs=["training"]) | figures/fig3_coefplot.png(300dpi)+.pdf |
| §5 | Table 3: heterogeneity by subgroup | pf.etable(g_full, g_male, g_fem, g_q1...q4) | tables/table3_heterogeneity.xlsx/.docx/.tex |
| §5 | Figure 4: dose-response / CATE | econml.CausalForestDML(...).effect() + matplotlib hist | figures/fig4_cate.png(300dpi)+.pdf |
| §6 | Table 4: mechanism / outcome ladder | loop pf.feols over outcomes → pf.etable | tables/table4_mechanism.xlsx/.docx/.tex |
| §7 | Table A1: robustness master (one column per check) | pf.etable([base, no99, balpan, dropearly, wfe, cl2way, logy, ihsy, psm, ebal]) | tables/tableA1_robustness.xlsx/.docx/.tex |
| §7 | Figure 5: spec curve | hand-rolled itertools.product + matplotlib errorbar | figures/fig5_spec_curve.png(300dpi)+.pdf |
| §7 | Figure 6: sensitivity (HonestDiD / Oster / E-value) | HonestDiD · oster_bound · evalue | figures/fig6_sensitivity.png(300dpi)+.pdf |
| §8 | Replication bundle: all tables in one document | pf.etable([...] + extra=[...], type="docx") · pylatex / texdoc-style multi-panel | replication/paper_tables.xlsx/.docx/.tex |
Every Python estimator above (
pf.feols/IV2SLS/att_gtvia R-callout /CausalForestDML) returns a result object that can be passed straight intopf.etable(...)/pf.coefplot(...)/Stargazer(...). Don't hand-roll LaTeX fromdf.to_latex(), and don't render Word viapython-docxdirectly —pf.etable/Stargazerapply book-tab borders, AER stars, and the right SE label automatically. For deeper export recipes (LaTeX / Word / Markdown variants, multi-panel.docx, fullgtsummary-style flow), seereferences/08-tables-plots.md.
关键规则(必须遵守):每个表格必须同时导出三种格式——Excel(.xlsx)、Word(.docx)、LaTeX(.tex)。每个图片必须同时保存PNG(≥300dpi)和PDF两种格式。
Three tiers, picked by scope:
| Tier | Use when | API | Hot kwargs |
|---|---|---|---|
| 1. Single multi-column table | Exporting one Table 2 / Table 3 / Table A1 with progressive columns | pf.etable([m1,...,mN], type="tex"/"docx"/"xlsx", file="...", headers=[...], digits=3, signif_code=[0.1,0.05,0.01]) — or Stargazer([m1.fit,...,mN.fit]).render_latex() for statsmodels-only | keep=, drop=, coef_map=, headers=, digits=, signif_code=, fixef_rm=, notes= |
| 2. Multi-panel paper format (Tables 2 + 3 + A1 + A2 in one file) | Producing the paper-tables block — main + heterogeneity + robustness + placebo as a single document | Stack via repeat pf.etable(..., extralines=...) calls, or use gtsummary-style chained tables; for true single-file multi-panel, write to a .tex then concat | first panel: write; subsequent: append; surround with LaTeX \section{} headers |
| 3. Full session bundle (the Stata collect / R gt equivalent) | Replication appendix that mixes summary stats + balance + multiple regression tables + headings + prose in one file | Compose programmatically with pylatex / python-docx / quarto — render once. Or use statsmodels.iolib.summary2.summary_col for a quick concat of tables. | journal-style template + per-section heading + footnote macros |
Journal styling — pick the right signif_code and SE label. AEA convention is [0.1, 0.05, 0.01] and SE label "Cluster-robust standard errors in parentheses". Define a wrapper once at the top of master.py:
# top of master.py — journal house-style wrapper
AER_SIGNIF = [0.1, 0.05, 0.01]
AER_NOTES = ("Cluster-robust standard errors in parentheses. "
"* p<0.10, ** p<0.05, *** p<0.01.")
def aer_table(models, *, file, headers=None, coef_map=None):
# 同时导出三种格式:.xlsx(用于编辑)、.docx(用于Word)、.tex(用于LaTeX)
base, ext = os.path.splitext(file)
for ext, type_ in [(".xlsx", "xlsx"), (".docx", "docx"), (".tex", "tex")]:
pf.etable(models, type=type_, file=base + ext,
headers=headers, coef_map=coef_map,
digits=3, signif_code=AER_SIGNIF, notes=AER_NOTES)
For the multi-panel .docx / .xlsx and Markdown / Quarto cookbook (single-file paper-tables bundle), see references/08-tables-plots.md.
Before touching the data, write down (a) the population, (b) the design, (c) the minimum detectable effect (MDE) under the planned sample size and α=0.05, β=0.20. Persist the result as pap.json so a referee can verify the design was powered before, not after, the data were seen.
import json
from statsmodels.stats.power import TTestIndPower, NormalIndPower
from statsmodels.stats.proportion import samplesize_proportions_2indep_onetail
# Two-sample MDE for a continuous outcome (Cohen's d framing)
analysis = TTestIndPower()
n_required = analysis.solve_power(effect_size=0.20, power=0.80, alpha=0.05, ratio=1.0)
print(f"n per arm for d=0.20, 80% power: {n_required:.0f}")
# Solve for MDE given fixed n
mde = analysis.solve_power(nobs1=2000, power=0.80, alpha=0.05, ratio=1.0)
print(f"MDE (Cohen's d) at n=2000 per arm: {mde:.3f}")
# Cluster-randomized RCT — design effect = 1 + (m-1)·ICC
m, icc = 50, 0.05
deff = 1 + (m - 1) * icc
n_eff_required = analysis.solve_power(effect_size=0.20, power=0.80, alpha=0.05) * deff
print(f"n per arm under ICC={icc}, cluster size={m}: {n_eff_required:.0f}")
# DID: use Frison-Pocock / Bloom (1995) — see references/05-modeling.md §5.4
# RD: power via Monte Carlo — see references/05-modeling.md §5.5
# Persist the protocol — the referee will ask whether the design was powered ex ante
pap = {
"population": "manufacturing workers, 2010–2020",
"treatment": "training (binary, staggered adoption)",
"outcome": "log_wage",
"estimand": "ATT",
"design": "staggered DID, Callaway–Sant'Anna",
"alpha": 0.05,
"power_target": 0.80,
"mde_d": 0.20,
"n_planned": 12000,
"frozen_at": "2026-01-15T09:00:00Z",
"git_sha": "<paste>",
}
with open("artifacts/pap.json", "w") as f:
json.dump(pap, f, indent=2)
Commit artifacts/pap.json in the repo before Step 1. AEA RCT Registry / OSF preregistration tools accept it as the analysis-plan exhibit.
An AER §1 Data section has three jobs: (a) describe sources, (b) document every sample restriction (the "footnote 4" sample log), (c) lock the panel structure. The data contract is the cure for "mysterious sample-size shrinkage" bugs in the response letter.
import pandas as pd, json
sample_log = []
df_raw = pd.read_csv("raw.csv")
sample_log.append(("0. raw", len(df_raw)))
df1 = df_raw.dropna(subset=["wage"])
sample_log.append(("1. drop missing wage", len(df1)))
df2 = df1[df1["age"].between(18, 65)]
sample_log.append(("2. drop age outside 18-65", len(df2)))
df3 = df2[df2["industry"].isin({"manuf","construction","transport"})]
sample_log.append(("3. keep target industries", len(df3)))
df = df3
for label, n in sample_log:
print(f" {label:<30s} N = {n:>10,d}")
with open("artifacts/sample_construction.json", "w") as f:
json.dump(sample_log, f, indent=2)
Paste the printed lines verbatim as footnote 4 of the paper.
import pandas as pd, numpy as np, json
from scipy import stats
def data_contract(df, *, y, treatment, id=None, time=None, covariates=()):
"""Return a go/no-go dict. Stop the pipeline if any required check fails."""
keys = [y, treatment] + ([id, time] if id and time else []) + list(covariates)
c = {
"n_obs": len(df), # 1. shape
"dtypes": df[keys].dtypes.astype(str).to_dict(), # 2. dtypes
"n_missing": df[keys].isna().sum().to_dict(), # 3. missingness
"n_dupes_on_keys": 0,
"panel_balanced": None,
"cohort_sizes": None,
}
if id and time:
c["n_dupes_on_keys"] = int(df.duplicated([id, time]).sum()) # 4. dups
bal = df.groupby(id).size()
c["panel_balanced"] = bool((bal == bal.max()).all()) # 5. balance
c["n_dropped_by_balance"] = int((bal != bal.max()).sum())
if "first_treat_year" in df.columns:
c["cohort_sizes"] = (
df.drop_duplicates(id).groupby("first_treat_year")
.size().to_dict()
)
c["y_range"] = (float(df[y].min()), float(df[y].max()))
c["treatment_share"] = float(df[treatment].mean())
# MCAR sniff test (Rubin) — if missing(y) is associated with covariates,
# listwise deletion biases the estimate. Use MI / IPW instead.
miss_y = df[y].isna()
c["mcar_hint"] = "likely MCAR (listwise OK)"
if miss_y.any() and (~miss_y).any():
for cov in covariates:
if df[cov].dtype.kind in "fi":
_, p = stats.ttest_ind(df.loc[miss_y, cov].dropna(),
df.loc[~miss_y, cov].dropna(),
equal_var=False)
if p < 0.05:
c["mcar_hint"] = (f"NOT MCAR (y-miss differs on {cov}, p={p:.3f}) "
f"→ use MI / IPW, NOT listwise drop")
break
return c
contract = data_contract(df, y="wage", treatment="training",
id="worker_id", time="year",
covariates=["age", "edu", "tenure"])
assert contract["n_dupes_on_keys"] == 0, f"dup (id, time): {contract['n_dupes_on_keys']}"
assert all(v == 0 for v in contract["n_missing"].values()), \
f"NaNs on keys: {contract['n_missing']}"
with open("artifacts/data_contract.json", "w") as f:
json.dump(contract, f, indent=2, default=str)
If any assertion fires, stop and fix it in pandas. Estimators silently drop NaN rows downstream — this contract is the cheapest insurance against "why did N drop from 12,000 to 9,800 between Table 1 and Table 2?" referee questions.
Deeper patterns: references/01-data-cleaning.md — missing-value strategies (MCAR/MAR/MNAR), outlier detection (IQR / z-score / Mahalanobis), dtype coercion, merging gotchas, panel-structure validation, deduplication.
import pandas as pd
import numpy as np
df = pd.read_csv("raw.csv")
# 1a. Inspect — always do this first
df.info() # dtypes + non-null counts
df.describe(include="all").T # numeric + categorical
df.isna().mean().sort_values(ascending=False) # missingness share per column
# 1b. Fix dtypes (strings-that-should-be-numeric are the #1 silent bug)
df["year"] = pd.to_numeric(df["year"], errors="coerce")
df["wage"] = pd.to_numeric(df["wage"], errors="coerce")
df["gender"] = df["gender"].astype("category")
df["date"] = pd.to_datetime(df["date"], errors="coerce")
# 1c. Missing values — decide PER VARIABLE, never blanket-drop
key_vars = ["wage", "training", "worker_id", "year"]
df = df.dropna(subset=key_vars) # drop rows missing on keys
df["tenure"] = df["tenure"].fillna(df["tenure"].median()) # median-impute numeric covariate
df["union"] = df["union"].fillna("unknown") # explicit "unknown" for categorical
# 1d. Outliers — flag first, winsorize in Step 2
df["wage_z"] = (df["wage"] - df["wage"].mean()) / df["wage"].std()
outlier_mask = df["wage_z"].abs() > 4
print(f"{outlier_mask.sum()} rows flagged as |z|>4 on wage")
# 1e. Deduplicate on the panel key
dupes = df.duplicated(subset=["worker_id", "year"], keep=False)
assert dupes.sum() == 0, f"{dupes.sum()} duplicate (worker_id, year) pairs"
# 1f. Merge auxiliary data — use validate= to catch silent m:m blowups
df = df.merge(firm_chars, on="firm_id", how="left", validate="many_to_one")
# 1g. Panel structure check — balanced vs. unbalanced
panel_summary = df.groupby("worker_id")["year"].agg(["count", "min", "max"])
print(panel_summary.describe())
is_balanced = (panel_summary["count"] == panel_summary["count"].max()).all()
print(f"Balanced: {is_balanced}")
Key principle: pandas + explicit decisions. Never silently drop rows inside an estimator — all row exclusions happen in Step 1 with a printed count.
Deeper patterns: references/02-data-transformation.md — log/ihs/Box–Cox, winsorizing vs. trimming, within-group standardization, one-hot vs. ordinal vs. target encoding, interaction terms, lag/lead operators, first differences, deflation with CPI.
# 2a. Log / IHS transform (skewed positive variables)
df["log_wage"] = np.log(df["wage"].clip(lower=1)) # floor at 1 to avoid -inf
df["ihs_assets"] = np.arcsinh(df["assets"]) # handles zero / negative
# 2b. Winsorize (top/bottom 1%) — reduces outlier influence without dropping rows
from scipy.stats.mstats import winsorize
df["wage_w"] = winsorize(df["wage"], limits=[0.01, 0.01]).data
# 2c. Standardize (z-score) — for interpretability or ML
df["age_std"] = (df["age"] - df["age"].mean()) / df["age"].std()
# 2d. Categorical encoding
df = pd.get_dummies(df, columns=["industry"], prefix="ind", drop_first=True)
# 2e. Interaction & polynomial
df["age_sq"] = df["age"] ** 2
df["training_x_edu"] = df["training"] * df["edu"]
# 2f. Panel operators — first difference, lag, lead, within-group mean
df = df.sort_values(["worker_id", "year"])
df["wage_l1"] = df.groupby("worker_id")["log_wage"].shift(1) # lag
df["wage_f1"] = df.groupby("worker_id")["log_wage"].shift(-1) # lead
df["d_wage"] = df.groupby("worker_id")["log_wage"].diff() # Δy_it
df["wage_mean"] = df.groupby("worker_id")["log_wage"].transform("mean") # within-unit mean
# 2g. Treatment timing variables for staggered DID
df["first_treat_year"] = df.groupby("worker_id")["training"] \
.transform(lambda s: s.idxmax() if s.any() else np.nan)
df["rel_time"] = df["year"] - df["first_treat_year"]
This is the heart of an AER paper. Before any code, write down the equation explicitly and state the identifying assumption. Vague identification language is the single most common reason a referee rejects an applied paper. Persist the strategy as strategy.md so it is a dated, version-controlled artifact — not a post-hoc rationalization written after seeing the coefficient.
| Design | Estimating equation | Identifying assumption | Python estimator |
|---|---|---|---|
| 2×2 DID | Y_it = α_i + λ_t + β·D_it + X'γ + ε_it | parallel trends conditional on X | pf.feols("y ~ i(treated, post, ref=0) | i + t", df, vcov={"CRV1":"i"}) |
| Event-study (CS / SA) | Y_it = α_i + λ_t + Σ_{e≠-1} β_e · 1{t-G_i = e} + ε_it | no anticipation + group-time PT | pf.feols("y ~ i(rel, ref=-1) | i + t", df) · att_gt (R callout) · did_imputation |
| 2SLS | Y_i = α + β·D_i + X'γ + ε_i; D_i = π·Z_i + X'δ + u_i | exclusion + relevance + monotonicity | linearmodels.iv.IV2SLS.from_formula("y ~ X + [D ~ Z]", df) |
| Sharp RD | Y_i = α + β·1{X_i ≥ c} + f(X_i) + ε_i (local poly) | continuity of E[Y(0)|X] at c, no manipulation | rdrobust(y, x, c=0) (+ rddensity) |
| SCM | Ŷ_1t(0) = Σ_j ŵ_j Y_jt, τ_t = Y_1t − Ŷ_1t(0) for t≥T_0 | pre-period fit + interpolation validity | pysynth · synthdid (R callout) |
| Selection-on-observables (matching/IPW/DML) | Y_i = m(X_i) + β·D_i + ε_i (Robinson partialling-out) | unconfoundedness + overlap | econml.dml.DML · econml.dr.DRLearner · causalml.matching.PSM |
┌─ running var + cutoff ───────────────── RDD (rdrobust)
│
├─ exogenous instrument Z ─────────────── IV/2SLS (linearmodels.IV2SLS)
data + question ─┤
├─ pre/post × treat/control ─┬ 2 periods ── 2×2 DID (pf.feols + i())
│ └ staggered ── CS / SA / BJS (att_gt / sunab / did_imputation)
│
├─ 1 treated unit + donor pool + long pre ── SCM (pysynth / synthdid)
│
├─ high-dim X, selection-on-observables ── ML causal (econml.dml / causalml — see §B)
│
└─ none of the above ──────────────────── matching + sensitivity (causalml + evalue)
strategy.md templatefrom pathlib import Path
strategy = """\
# Empirical Strategy (pre-registration)
**Frozen**: 2026-01-15 (Git SHA: <paste>)
**Population**: manufacturing workers, 2010–2020, balanced panel
**Treatment**: training (binary, staggered adoption)
**Outcome**: log_wage (CPI-deflated 2010 USD)
**Estimand**: ATT on the treated, dynamic horizon -4..+4
## Estimating equation (paste from §2.5 row that matches the design)
log_wage_it = α_i + λ_t + Σ_{e≠-1} β_e · 1{t - G_i = e} + ε_it
## Identifying assumption
1. No anticipation: E[Y_it(0) | t < G_i] = E[Y_it(0) | never-treated]
2. Group-time PT: Δ E[Y_it(0)] is the same across treatment cohorts
## Auto-flagged threats (must defend in §2)
- Selection of G_i on Y_i(0) → bacondecomp + honestdid sensitivity
- Spillover within firm → cluster at firm_id, also try firm_id × year
- Anticipation in pre-period → include lead in event study
## Fallback estimators (Step 6 robustness)
- Sun–Abraham via `pf.feols("y ~ sunab(G, t) | i + t", df)` (or R callout to `fixest::sunab`)
- Borusyak-Jaravel-Spiess via `did_imputation` (R callout)
- Synthetic DID via `synthdid` (R callout)
"""
Path("artifacts/strategy.md").write_text(strategy)
Commit artifacts/strategy.md in the repo before running Step 5 / Step 6. The git log of this file is the analysis plan.
Deeper patterns: references/03-descriptive-stats.md — stratified Table 1 with SMDs, weighted descriptives, distributional comparison (ECDF, QQ), correlation heatmaps + clustering, panel balance checks, time-series seasonality plots.
import seaborn as sns
import matplotlib.pyplot as plt
# 3a. Full-sample summary — classic "Table 1, column 1"
stats = df[["log_wage","age","edu","tenure","training"]].describe().T
stats["N"] = df[["log_wage","age","edu","tenure","training"]].notna().sum()
print(stats[["N","mean","std","min","25%","50%","75%","max"]])
# 3b. Stratified Table 1 (treated vs. control + t-test / SMD)
from scipy import stats as sps
def table1(df, by, cols):
rows = []
for c in cols:
t, ctrl = df.loc[df[by]==1, c], df.loc[df[by]==0, c]
smd = (t.mean() - ctrl.mean()) / np.sqrt((t.var()+ctrl.var())/2)
p = sps.ttest_ind(t.dropna(), ctrl.dropna(), equal_var=False).pvalue
rows.append([c, t.mean(), t.std(), ctrl.mean(), ctrl.std(), smd, p])
return pd.DataFrame(rows, columns=["var","treat_mean","treat_sd",
"ctrl_mean","ctrl_sd","SMD","p"])
t1 = table1(df, by="training", cols=["log_wage","age","edu","tenure"])
print(t1.round(3))
# 3c. Correlation heatmap
corr = df[["log_wage","age","edu","tenure","training"]].corr()
sns.heatmap(corr, annot=True, cmap="RdBu_r", center=0, vmin=-1, vmax=1)
plt.tight_layout(); plt.savefig("fig_corr.pdf")
# 3d. Distribution plot — density by treatment status
fig, ax = plt.subplots(figsize=(6,4))
for g, sub in df.groupby("training"):
sub["log_wage"].plot.kde(ax=ax, label=f"training={g}")
ax.set_xlabel("log wage"); ax.legend(); plt.savefig("fig_dist.pdf")
# 3e. Time trends — treated vs. control (the DID motivation plot)
trend = df.groupby(["year","training"])["log_wage"].mean().unstack()
trend.plot(marker="o"); plt.ylabel("mean log wage"); plt.savefig("fig_trend.pdf")
# 3f. Panel balance diagnostic — how many units observed each year?
df.groupby("year")["worker_id"].nunique().plot(kind="bar")
plt.ylabel("# unique workers"); plt.savefig("fig_panel_balance.pdf")
AER convention: the identification figure precedes the regression table. The reader should see graphical evidence that PT holds / first stage is strong / RD jumps cleanly before you ask them to trust your point estimate.
Pre-period coefficients ≈ 0 (with the −1 reference period normalized to zero) is the visual evidence for parallel trends. Pair the figure with a numerical pre-trends test so reviewers don't have to eyeball it.
import pyfixest as pf
import matplotlib.pyplot as plt
# (a) Sun-Abraham via pyfixest::sunab — the modern primary for staggered DID
es = pf.feols("log_wage ~ sunab(first_treat_year, year) | worker_id + year",
data=df, vcov={"CRV1": "worker_id"})
# (b) Coefficient figure
fig = pf.iplot(es,
figsize=(7, 4),
title="Figure 2a. Event-study coefficients (95% CI; ref. e = -1)",
xlabel="Years relative to treatment",
ylabel="Coefficient (ATT)")
fig.savefig("figures/fig2a_event_study.pdf", dpi=300)
fig.savefig("figures/fig2a_event_study.png", dpi=300)
# (c) Numerical pre-trends F-test (Wald on the leads jointly = 0)
import numpy as np
pre_idx = [k for k in es.coef().index if "rel_time::-" in k and "ref" not in k]
W = es.wald_test(pre_idx)
print(f"Pre-trends Wald χ² = {W.statistic:.2f}, p = {W.pvalue:.3f}")
# (d) Bacon decomposition (Goodman-Bacon 2021) — TWFE diagnostic
# Pure-Python: callout to R::bacondecomp via rpy2 (Stata/R have first-class support)
# OR use the `bacondecomp` Python port if installed.
try:
from bacondecomp import bacon
bd = bacon(df, y="log_wage", D="training", time="year", id_var="worker_id")
bd.plot(); plt.savefig("figures/fig2a_bacon.pdf", dpi=300)
except ImportError:
print("bacondecomp not installed; use R callout or Stata's -bacondecomp-.")
Rule of thumb: first-stage F ≥ 10 for OLS-style inference; F ≥ 23 for AR-equivalent inference (Stock–Yogo / Lee 2022).
from linearmodels.iv import IV2SLS
iv = IV2SLS.from_formula(
"log_wage ~ 1 + age + edu + [training ~ Z1 + Z2]",
data=df).fit(cov_type="clustered", clusters=df["firm_id"])
print(iv.first_stage) # reports first-stage F
print(iv.summary)
# Binscatter for the first-stage scatter (residualized on age + edu)
from binsreg import binsreg # pip install binsreg
res = binsreg(y=df["training"], x=df["Z1"], w=df[["age","edu"]],
nbins=20, polyreg=2, ci=(3,3))
res.bins_plot.savefig("figures/fig2b_first_stage.pdf", dpi=300)
The signature RD figure is rdplot (CCT-style binned scatter with local-polynomial fit on each side), paired with the McCrary manipulation test.
from rdrobust import rdplot, rdrobust
from rddensity import rddensity, rdplotdensity
# (a) Canonical RD plot — binned means + local poly on each side
rdp = rdplot(y=df["outcome"], x=df["running_var"], c=0,
p=4, kernel="triangular", binselect="esmv")
plt.savefig("figures/fig2c_rdplot.pdf", dpi=300)
# (b) McCrary density (Cattaneo-Jansson-Ma 2018)
dens = rddensity(X=df["running_var"], c=0)
print(dens)
rdplotdensity(dens, X=df["running_var"])
plt.savefig("figures/fig2c_mccrary.pdf", dpi=300)
import causalml.matching as cm
from causalml.match import NearestNeighborMatch
import seaborn as sns
# Pre-matching SMDs
pre_smd = ((df.loc[df.training==1, ["age","edu","tenure"]].mean()
- df.loc[df.training==0, ["age","edu","tenure"]].mean())
/ df[["age","edu","tenure"]].std())
# Match
psm = NearestNeighborMatch(replace=False, ratio=1, random_state=42)
matched = psm.match(data=df, treatment_col="training",
score_cols=["age","edu","tenure"])
# Post-matching SMDs
post_smd = ((matched.loc[matched.training==1, ["age","edu","tenure"]].mean()
- matched.loc[matched.training==0, ["age","edu","tenure"]].mean())
/ matched[["age","edu","tenure"]].std())
love = pd.DataFrame({"pre": pre_smd.abs(), "post": post_smd.abs()})
love.plot.barh(); plt.axvline(0.10, ls="--", c="r")
plt.title("Figure 2d. Love plot — |SMD| pre vs post matching (target < 0.10)")
plt.savefig("figures/fig2d_loveplot.pdf", dpi=300)
For synthetic-control designs the canonical Figure 2 is the treated-vs-synthetic time series with treatment time annotated.
# Pure Python: pysynth or sparseSC; for SDiD use rpy2 callout to R::synthdid
from SyntheticControlMethods import Synth
sc = Synth(df, "outcome", "unit_id", "time", treatment_period=2015,
treated_unit=1, n_optim=10)
sc.plot(["original", "pointwise"], treated_label="Treated unit",
synth_label="Synthetic control")
plt.savefig("figures/fig2e_synth_trajectory.pdf", dpi=300)
Identification-specific checks (PT for DID, weak-IV F, density for RD, common support for matching) are also auto-run inside the Step-5 estimators — don't duplicate the numerics here, but DO produce the figures: a referee scans the figures first.
Deeper patterns: references/04-statistical-tests.md — every classical test with its null/alternative/decision rule (Shapiro–Wilk, Kolmogorov–Smirnov, Jarque–Bera, Breusch–Pagan, White, Goldfeld–Quandt, Durbin–Watson, Breusch–Godfrey, Ljung–Box, ADF, KPSS, Phillips–Perron, VIF, condition number, Hausman, Wu–Hausman, Sargan–Hansen).
Run diagnostics before taking estimates at face value. The 5 classes below cover 90% of applied work.
import statsmodels.api as sm
from statsmodels.stats.diagnostic import (
het_breuschpagan, het_white, acorr_breusch_godfrey, acorr_ljungbox,
)
from statsmodels.stats.outliers_influence import variance_inflation_factor
from statsmodels.stats.stattools import durbin_watson, jarque_bera
from scipy import stats as sps
# Fit a baseline OLS to get residuals for diagnostics
X = sm.add_constant(df[["training","age","edu","tenure"]])
y = df["log_wage"]
ols = sm.OLS(y, X, missing="drop").fit()
# 4a. Normality of residuals (informative but NOT required for OLS — CLT handles large N)
jb_stat, jb_p, skew, kurt = jarque_bera(ols.resid)
sw_stat, sw_p = sps.shapiro(ols.resid.sample(min(5000, len(ols.resid))))
print(f"Jarque-Bera p={jb_p:.3f} Shapiro p={sw_p:.3f} skew={skew:.2f} kurt={kurt:.2f}")
# 4b. Heteroskedasticity — Breusch-Pagan + White
bp = het_breuschpagan(ols.resid, ols.model.exog)
wh = het_white (ols.resid, ols.model.exog)
print(f"Breusch-Pagan p={bp[1]:.3f} White p={wh[1]:.3f}")
# → if p<0.05, use robust / cluster-robust SEs (you already should)
# 4c. Autocorrelation (time-series / panel) — Durbin-Watson + Breusch-Godfrey + Ljung-Box
dw = durbin_watson(ols.resid) # ~2 = no AR(1)
bg = acorr_breusch_godfrey(ols, nlags=4) # general AR(p)
lb = acorr_ljungbox(ols.resid, lags=[4,8], return_df=True)
print(f"Durbin-Watson={dw:.2f} Breusch-Godfrey p={bg[1]:.3f}")
print(lb)
# 4d. Multicollinearity — VIF + condition number
vif = pd.DataFrame({
"var": X.columns,
"VIF": [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
})
print(vif) # VIF > 10 is the classic red flag
cond_number = np.linalg.cond(X.values) # > 30 = potential collinearity
print(f"Condition number = {cond_number:.1f}")
# 4e. Stationarity (time-series) — ADF + KPSS (dual test: ADF rejects unit root, KPSS accepts stationarity)
from statsmodels.tsa.stattools import adfuller, kpss
adf_stat, adf_p, *_ = adfuller(df["log_wage"].dropna(), autolag="AIC")
kpss_stat, kpss_p, *_ = kpss (df["log_wage"].dropna(), regression="c", nlags="auto")
print(f"ADF p={adf_p:.3f} KPSS p={kpss_p:.3f}")
Decision table (classic rules of thumb):
| Test | Null | Action if rejected |
|------|------|-------------------|
| Jarque–Bera / Shapiro | residuals ~ Normal | usually ignore when N large; bootstrap CIs if small-N inference matters |
| Breusch–Pagan / White | homoskedastic errors | use cov_type="HC3" or cluster SEs |
| Durbin–Watson / Breusch–Godfrey | no autocorrelation | use HAC (Newey–West) or cluster by unit |
| VIF > 10 / cond# > 30 | — | drop / combine collinear regressors |
| ADF rejects + KPSS fails to reject | series is stationary | fit levels |
| ADF fails to reject | unit root | first-difference or cointegration test |
Deeper patterns: references/05-modeling.md — every classical estimator with API: OLS, WLS, GLS, logit/probit, panel FE / RE / PO, clustered SEs, 2SLS / LIML / GMM, DID (2×2, TWFE, event study, CS, SA, BJS, SDiD), RD (sharp, fuzzy, kink, multi-cutoff), Synthetic Control, PSM / IPW / EB, DML / causal forest / DR-Learner.
This is the densest section of an applied paper. A modern AER §4 typically contains 2–3 multi-regression tables and one coefficient plot:
Estimator routing (memorize this — getting it wrong silently produces nonsense):
- No FE / single low-card FE →
smf.ols("y ~ X", df).fit(cov_type="cluster", cov_kwds={"groups": df["i"]})- High-dim FE →
pf.feols("y ~ X | fe1 + fe2", df, vcov={"CRV1":"i"})- Two-way cluster →
pf.feols(..., vcov={"CRV1":"firm_id+year"})- 2SLS / IV →
IV2SLS.from_formula("y ~ X + [D ~ Z]", df).fit(...)orpf.feols("y ~ X | D ~ Z", df)- DID / event-study →
pf.feols("y ~ sunab(G, t) | i + t", df)for SA; R callout todid::att_gtfor CS
Pick the estimator by identification strategy (not by "what's trendy"):
Observational cross-section, selection on observables → OLS + controls | PSM / IPW / DML
Observational panel, policy shock, parallel trends → DID (TWFE / CS / SA / BJS / SDiD)
Exogenous instrument for endogenous X → 2SLS / LIML / GMM (linearmodels / pyfixest)
Discontinuity in assignment rule → Sharp / Fuzzy / Kink RD (rdrobust)
N=1 treated unit, long panel → Synthetic Control (pysynth / SDiD)
High-dim controls or heterogeneous effects → DML / Causal Forest (econml)
Binary outcome → Logit / Probit (statsmodels)
Count outcome → Poisson / NegBin (pyfixest / statsmodels)
Canonical calls (details in references/05-modeling.md). The eight regression-table patterns A–H below are the AER table cookbook — pf.etable(*models, ...) is the workhorse, equivalent to Stata's outreg2/esttab and R's modelsummary.
Stable β̂ across columns ⇒ less concern that selection on observables is driving the estimate (Oster 2019 selection-stability logic; quantified in Step 6.f).
import pyfixest as pf
m1 = pf.feols("log_wage ~ training", data=df, vcov={"CRV1":"firm_id"})
m2 = pf.feols("log_wage ~ training + age + edu", data=df, vcov={"CRV1":"firm_id"})
m3 = pf.feols("log_wage ~ training + age + edu + tenure + firm_size", data=df, vcov={"CRV1":"firm_id"})
m4 = pf.feols("log_wage ~ training + age + edu + tenure + firm_size | industry + year",
data=df, vcov={"CRV1":"firm_id"})
m5 = pf.feols("log_wage ~ training + age + edu + tenure + firm_size | worker_id + year",
data=df, vcov={"CRV1":"firm_id"})
m6 = pf.feols("log_wage ~ training + age + edu + tenure + firm_size | worker_id + year + industry^year",
data=df, vcov={"CRV1":"firm_id"})
pf.etable([m1, m2, m3, m4, m5, m6],
type="tex", file="tables/table2_main.tex",
headers=["(1) Baseline", "(2) +Demog", "(3) +Labor-mkt",
"(4) Ind×Yr FE", "(5) Worker FE", "(6) Worker FE+Ind×Yr"],
digits=3, signif_code=[0.1, 0.05, 0.01],
notes="Cluster-robust SE in parentheses, clustered at firm_id.")
pf.etable([m1, m2, m3, m4, m5, m6], type="docx", file="tables/table2_main.docx")
AER convention: show ALL controls (and the intercept). Pass NEITHER
keep=NORdrop=so every parameter is visible. Usekeep=["training"]only when a focal-coefficient-only table is intentional (interaction-form heterogeneity, IV first-stage triplet); usedrop=["Intercept"]only when you want to suppress the constant for paper aesthetics.
Show the same coefficient of interest under multiple identification strategies. This is the AER credibility move: convergent evidence across designs each making different identifying assumptions.
from linearmodels.iv import IV2SLS
from econml.dml import LinearDML
from causalml.match import NearestNeighborMatch
ols = pf.feols("log_wage ~ training + age + edu + tenure | industry + year",
data=df, vcov={"CRV1":"firm_id"})
iv = IV2SLS.from_formula("log_wage ~ 1 + age + edu + tenure + [training ~ Z1 + Z2]",
data=df).fit(cov_type="clustered", clusters=df["firm_id"])
did = pf.feols("log_wage ~ sunab(first_treat_year, year) | worker_id + year",
data=df, vcov={"CRV1":"firm_id"})
dml = LinearDML().fit(df["log_wage"], df["training"],
X=df[["age","edu","tenure","firm_size"]])
psm = NearestNeighborMatch(replace=False, ratio=1).match(
df, treatment_col="training", score_cols=["age","edu","tenure"])
# Wrap non-pyfixest models or use Stargazer for a multi-source table:
from stargazer.stargazer import Stargazer
table = Stargazer([ols.fit, iv, dml._final_estimator])
table.title("Table 2-bis. Convergent evidence across designs")
table.custom_columns(["(1) OLS+FE", "(2) 2SLS", "(3) DML"], [1,1,1])
open("tables/table2b_designs.tex", "w").write(table.render_latex())
ys = ["log_wage", "weeks_employed", "left_firm", "promoted"]
multi_y = [pf.feols(f"{y} ~ training + age + edu + tenure | industry + year",
data=df, vcov={"CRV1":"firm_id"}) for y in ys]
pf.etable(multi_y, type="tex", file="tables/table2c_multi_outcome.tex",
headers=ys, keep="training",
notes="Each column is a separate regression on the labelled outcome.")
Same model family, two horizons (short-run / long-run) or two samples. Stack vertically with two pf.etable calls + LaTeX glue.
panelA = [pf.feols("wage_t1 ~ training + X | industry + year", data=df, vcov={"CRV1":"firm_id"}),
pf.feols("wage_t1 ~ training + X | worker_id + year", data=df, vcov={"CRV1":"firm_id"})]
panelB = [pf.feols("wage_t5 ~ training + X | industry + year", data=df, vcov={"CRV1":"firm_id"}),
pf.feols("wage_t5 ~ training + X | worker_id + year", data=df, vcov={"CRV1":"firm_id"})]
# Write Panel A
pf.etable(panelA, type="tex", file="tables/table2d_horizons.tex",
headers=["(1) Industry FE", "(2) Worker FE"], keep="training",
custom_row=[("Horizon", "1 year", "1 year")])
# Append Panel B (manually concat via texdoc-style glue, or use a wrapper)
The textbook AER IV table presents the first stage, the reduced form, and the 2SLS in three columns so the reader can verify Wald-ratio = RF / FS.
fs = pf.feols("training ~ Z + age + edu | industry + year", data=df, vcov={"CRV1":"firm_id"})
rf = pf.feols("log_wage ~ Z + age + edu | industry + year", data=df, vcov={"CRV1":"firm_id"})
iv2 = IV2SLS.from_formula("log_wage ~ 1 + age + edu + [training ~ Z]",
data=df).fit(cov_type="clustered", clusters=df["firm_id"])
pf.etable([fs, rf], type="tex", file="tables/table2e_iv_triplet.tex",
headers=["(1) First stage", "(2) Reduced form"], keep=["Z"],
notes=("First-stage F = " + f"{fs.fitstat()['F']:.2f}"))
# Append the 2SLS column (linearmodels IV2SLS output) via Stargazer or by hand.
IV triplet is intentionally focal: show only Z + endogenous regressor so the reader can eyeball the Wald ratio. Drop
keep=["Z"]only if a referee asks for the full coefficient list.
did::att_gt / synth / econml.DMLFor DID / SCM / matching mains, the modern Python estimator returns a self-contained estimate + automatic placebos / pre-trends / overlap diagnostics. Pipe into pf.etable via a thin adapter, or use Stargazer directly.
# DML with full diagnostics
from econml.dml import CausalForestDML
dml = CausalForestDML(n_estimators=2000, min_samples_leaf=5)
dml.fit(df["log_wage"], df["training"], X=df[["age","edu","tenure","firm_size"]])
ate = dml.ate(df[["age","edu","tenure","firm_size"]])
ci = dml.ate_interval(df[["age","edu","tenure","firm_size"]])
print(f"DML ATE = {ate:.3f} (95% CI [{ci[0]:.3f}, {ci[1]:.3f}])")
pf.etable (Table 3, see Step 7)One column per subgroup. Detailed code in §Step 7 — Heterogeneity.
Stack every robustness specification next to the baseline. Detailed code in §Step 6.f.
import statsmodels.formula.api as smf
import pyfixest as pf
from linearmodels.iv import IV2SLS
from rdrobust import rdrobust
# 5a. OLS with cluster-robust SEs (use when no FE or just one FE)
ols = smf.ols("log_wage ~ training + age + edu + tenure", data=df).fit(
cov_type="cluster", cov_kwds={"groups": df["firm_id"]})
print(ols.summary())
# 5b. Panel FE (unit + time) — reach for pyfixest first; fastest & mirrors R's fixest
fe = pf.feols("log_wage ~ training + age + edu + tenure | worker_id + year",
data=df, vcov={"CRV1": "worker_id"})
fe.summary()
# 5c. 2×2 DID
did = smf.ols("log_wage ~ treated * post + age + edu", data=df).fit(
cov_type="cluster", cov_kwds={"groups": df["worker_id"]})
# 5d. Event study (dynamic DID, base period = −1)
es = pf.feols("log_wage ~ i(rel_time, ref=-1) | worker_id + year",
data=df, vcov={"CRV1": "worker_id"})
pf.iplot(es)
# 5e. Staggered DID — Callaway–Sant'Anna (via diff-diff) OR Sun–Abraham (via pyfixest interactions)
# See references/05-modeling.md §5.4 for the full staggered DID playbook.
# 5f. IV / 2SLS
iv = IV2SLS.from_formula(
"log_wage ~ 1 + age + edu + [training ~ draft_lottery]", data=df
).fit(cov_type="clustered", clusters=df["firm_id"])
print(iv.first_stage) # first-stage F > 10 (ideally > 104)
print(iv.summary)
# 5g. Sharp RD
rd = rdrobust(y=df["outcome"], x=df["running_var"], c=0,
kernel="triangular", bwselect="mserd")
print(rd)
# 5h. Binary outcome
logit = smf.logit("employed ~ training + age + edu", data=df).fit()
print(logit.summary())
print(logit.get_margeff().summary()) # marginal effects — the interpretable quantity
Deeper patterns: references/06-robustness.md — alternative specifications (add/drop controls), subsample splits, alternative clustering levels, alternative outcome definitions, placebo tests (fake treatment / fake timing / permutation), specification curve, Oster δ*, randomization inference, leave-one-out, winsorization sensitivity.
Every headline result in the paper needs a robustness appendix. The canonical 6:
# 6a. Alternative specifications — progressive controls (M1 → M6)
specs = [
"log_wage ~ training", # M1 raw
"log_wage ~ training + age + edu", # M2 +covariates
"log_wage ~ training + age + edu + tenure", # M3 +tenure
"log_wage ~ training + age + edu + tenure | worker_id", # M4 +unit FE
"log_wage ~ training + age + edu + tenure | worker_id + year", # M5 +time FE
"log_wage ~ training + age + edu + tenure | worker_id + year + industry^year", # M6 +industry×year
]
results = [pf.feols(f, data=df, vcov={"CRV1":"worker_id"}) for f in specs]
pf.etable(results) # side-by-side publication table
# 6b. Alternative cluster levels
for cl in ["worker_id", "firm_id", "industry", "state"]:
r = pf.feols("log_wage ~ training | worker_id+year", data=df, vcov={"CRV1": cl})
print(cl, r.coef()["training"], r.se()["training"])
# 6c. Subsample splits
for col in ["male", "has_college", "high_tenure"]:
for val in [0, 1]:
sub = df[df[col]==val]
r = pf.feols("log_wage ~ training | worker_id+year", data=sub)
print(col, val, r.coef()["training"], r.se()["training"])
# 6d. Placebo — fake timing (treat 3 years before actual policy; should be ~0)
df["fake_post"] = (df["year"] >= df["first_treat_year"] - 3).astype(int)
placebo = pf.feols("log_wage ~ fake_post | worker_id + year", data=df)
placebo.summary()
# 6e. Placebo — permutation / randomization inference (500 draws)
obs_coef = pf.feols("log_wage ~ training | worker_id+year", data=df).coef()["training"]
draws = []
for s in range(500):
df_s = df.copy()
df_s["training_perm"] = df_s.groupby("worker_id")["training"].transform(
lambda x: x.sample(frac=1, random_state=s).values)
r = pf.feols("log_wage ~ training_perm | worker_id+year", data=df_s)
draws.append(r.coef()["training_perm"])
p_perm = (np.abs(draws) >= abs(obs_coef)).mean()
print(f"Permutation p = {p_perm:.3f}")
# 6f. Oster (2019) δ* — selection on unobservables
# See references/06-robustness.md §6.6 for the exact formula and its implementation.
# ============================================================
# 6.g Pattern H — Robustness master table (Table A1, one column per check)
# ============================================================
import pyfixest as pf
base = pf.feols("log_wage ~ training + age + edu + tenure | industry + year",
data=df, vcov={"CRV1":"firm_id"})
no99 = pf.feols("log_wage ~ training + age + edu + tenure | industry + year",
data=df.query("wage < wage.quantile(0.99)"), vcov={"CRV1":"firm_id"})
balpan = pf.feols("log_wage ~ training + age + edu + tenure | industry + year",
data=df.groupby("worker_id").filter(lambda g: len(g) == g["year"].nunique()),
vcov={"CRV1":"firm_id"})
dropearly = pf.feols("log_wage ~ training + age + edu + tenure | industry + year",
data=df.query("first_treat_year > 2008"), vcov={"CRV1":"firm_id"})
wfe = pf.feols("log_wage ~ training + age + edu + tenure | worker_id + year",
data=df, vcov={"CRV1":"firm_id"})
cl2way = pf.feols("log_wage ~ training + age + edu + tenure | industry + year",
data=df, vcov={"CRV1":"firm_id+year"})
logy = pf.feols("np.log1p(wage) ~ training + age + edu + tenure | industry + year",
data=df, vcov={"CRV1":"firm_id"})
ihsy = pf.feols("np.arcsinh(wage) ~ training + age + edu + tenure | industry + year",
data=df, vcov={"CRV1":"firm_id"})
pf.etable([base, no99, balpan, dropearly, wfe, cl2way, logy, ihsy],
type="tex", file="tables/tableA1_robustness.tex",
headers=["(1) Baseline", "(2) Drop top 1%", "(3) Balanced",
"(4) Drop early", "(5) Worker FE", "(6) 2-way cluster",
"(7) log Y", "(8) IHS Y"],
keep="training",
notes="Each column is one robustness check. β̂ on training is the focal coefficient.")
# ============================================================
# 6.h Specification curve (Simonsohn-Simmons-Nelson 2020)
# ============================================================
import itertools, numpy as np
specs = []
for ctrls in [["age"], ["age","edu"], ["age","edu","tenure"], ["age","edu","tenure","firm_size"]]:
for trans in ["log_wage", "ihs_wage"]:
for mask_label, mask in [("all", slice(None)),
("manuf", df["industry"]=="manuf"),
("no99", df["wage"] < df["wage"].quantile(0.99))]:
for cl in ["firm_id", "firm_id+year"]:
d = df.loc[mask] if not isinstance(mask, slice) else df
fit = pf.feols(f"{trans} ~ training + {' + '.join(ctrls)} | industry + year",
data=d, vcov={"CRV1": cl})
b = fit.coef()["training"]
se = fit.se()["training"]
specs.append({
"label": f"c={'+'.join(ctrls)}/y={trans}/s={mask_label}/cl={cl}",
"b": b, "lo": b - 1.96*se, "hi": b + 1.96*se})
specs_df = pd.DataFrame(specs).sort_values("b").reset_index(drop=True)
fig, ax = plt.subplots(figsize=(10, 5))
ax.errorbar(specs_df.index, specs_df["b"],
yerr=[specs_df["b"]-specs_df["lo"], specs_df["hi"]-specs_df["b"]],
fmt="o", capsize=2, ms=3)
ax.axhline(0, ls="--", c="gray")
ax.set_xlabel("Specification (sorted by β̂)")
ax.set_ylabel("Coefficient on training")
ax.set_title("Figure 5. Specification curve")
plt.tight_layout()
plt.savefig("figures/fig5_spec_curve.pdf", dpi=300)
plt.savefig("figures/fig5_spec_curve.png", dpi=300)
# ============================================================
# 6.i Sensitivity dashboard — HonestDiD + Oster + E-value
# ============================================================
# (a) HonestDiD — Rambachan-Roth (2023): Python via R callout (rpy2) or HonestDiD-py if installed.
# (b) Oster δ — see references/06-robustness.md §6.6 for the formula implementation.
# (c) E-value (VanderWeele-Ding 2017) — for risk-ratio outcomes; convert OLS to RR scale first.
def evalue(rr_point, rr_lower):
"""Min strength of unmeasured confounding to nullify; (point, ci_lower)."""
e_point = rr_point + np.sqrt(rr_point * (rr_point - 1))
e_ci = rr_lower + np.sqrt(rr_lower * (rr_lower - 1)) if rr_lower > 1 else 1.0
return e_point, e_ci
print("E-value (point, CI):", evalue(1.45, 1.10))
Deeper patterns: references/07-further-analysis.md — subgroup + Wald interaction test, triple-difference (DDD) for effect heterogeneity, Baron–Kenny / Imai mediation, causal mediation with sensitivity, moderated mediation, outcome ladder (short → intermediate → final), dose-response curves, CATE via causal forest.
# 7a. Heterogeneity via full interaction (cleanest: lets you test the interaction coefficient)
het = pf.feols("log_wage ~ training + training:female + age + edu | worker_id + year",
data=df, vcov={"CRV1": "worker_id"})
het.summary() # the interaction coefficient IS the heterogeneity test
# 7b. Subgroup estimation with Wald test of equality
from scipy.stats import chi2
male_r = pf.feols("log_wage ~ training | worker_id+year", data=df[df.female==0])
female_r = pf.feols("log_wage ~ training | worker_id+year", data=df[df.female==1])
diff = male_r.coef()["training"] - female_r.coef()["training"]
se = np.sqrt(male_r.se()["training"]**2 + female_r.se()["training"]**2)
wald = (diff/se)**2
print(f"Wald = {wald:.2f}, p = {1-chi2.cdf(wald,1):.3f}")
# 7c. Triple-difference (DDD) — heterogeneity by a THIRD dimension
ddd = pf.feols(
"log_wage ~ treated*post*high_exposure | worker_id + year",
data=df, vcov={"CRV1":"firm_id"})
# 7d. Mechanism — "outcome ladder" (same treatment, three sequential outcomes)
for out in ["hours_worked", "productivity", "log_wage"]:
r = pf.feols(f"{out} ~ training | worker_id + year", data=df)
print(out, r.coef()["training"], r.se()["training"])
# 7e. Mediation — Baron-Kenny (ok for simple linear setting; use Imai for rigor)
# Total: Y = a + c·T + ε
# Step 1: M = a1 + b·T + ε (does T affect M?)
# Step 2: Y = a2 + c'·T + d·M + ε (direct effect of T holding M fixed)
# Mediated effect = b · d
b_coef = smf.ols("hours_worked ~ training + age+edu", data=df).fit().params["training"]
d_coef = smf.ols("log_wage ~ training + hours_worked + age+edu", data=df) \
.fit().params["hours_worked"]
print(f"Indirect effect via hours = {b_coef*d_coef:.3f}")
# 7f. Moderation — add interaction + marginal-effect plot; see references/07 for the full recipe.
# 7g. Heterogeneous treatment effects via causal forest (high-dim moderators)
from econml.dml import CausalForestDML
cf = CausalForestDML(n_estimators=1000, min_samples_leaf=5)
cf.fit(df["log_wage"], df["training"], X=df[["age","edu","tenure","firm_size"]])
tau = cf.effect(df[["age","edu","tenure","firm_size"]]) # per-unit CATE
cf.feature_importances_ # which X drives heterogeneity
This step is mandatory — every analysis run produces all 5 required tables (T1–T5) and all 4 required figures (F1–F4) defined in the Default Output Spec at the top of this skill. Do not skip Step 8 because "the regression already ran". A coefficient without a table and a figure is not how applied economics communicates a result.
Deeper patterns: references/08-tables-plots.md — stargazer and pf.etable() for regression tables; coefficient plots with CIs; event-study plots (pre/post coefficients with reference line); binscatter; forest plots for subgroup analysis; RD plots; LaTeX / Word / Excel export.
# ============================================================
# 8a. ★ TABLE 2 — Main results, multi-column regression M1→M6
# (the centerpiece of every economics paper)
# ============================================================
from stargazer.stargazer import Stargazer
table = Stargazer([r.fit for r in [ols_m1, ols_m2, ols_m3, ols_m4, ols_m5, ols_m6]])
table.title("Effect of training on log wage")
table.custom_columns(["(1)","(2)","(3)","(4)","(5)","(6)"], [1]*6)
open("tables/table2_main.tex","w").write(table.render_latex())
# Or pyfixest's etable — handles FE indicators automatically (preferred):
pf.etable([m1, m2, m3, m4, m5, m6],
type="tex", file="tables/table2_main.tex",
headers=["(1) Raw","(2) +Demog","(3) +Tenure",
"(4) +Unit FE","(5) +2-way FE","(6) +Ind×Year FE"],
digits=3, signif_code=[0.1, 0.05, 0.01],
notes="Cluster-robust SE in parentheses, clustered at worker_id.")
pf.etable([m1, m2, m3, m4, m5, m6], type="docx", file="tables/table2_main.docx")
# ============================================================
# 8b. TABLE 1 — Summary statistics & balance
# ============================================================
# Built in Step 3 as `t1` (DataFrame). Export both formats:
t1.to_latex("tables/table1_balance.tex", float_format="%.3f", index=False)
t1.to_excel("tables/table1_balance.xlsx", index=False)
# .docx version via python-docx or pandas → docx through tabulate
# ============================================================
# 8c. TABLE 3 — Mechanism / outcome ladder (3+ outcomes)
# ============================================================
ladder = [pf.feols(f"{y} ~ training + age + edu + tenure | worker_id + year",
data=df, vcov={"CRV1":"worker_id"})
for y in ["hours_worked", "productivity", "log_wage"]]
pf.etable(ladder, type="tex", file="tables/table3_mechanism.tex",
headers=["Hours worked", "Productivity", "Log wage"],
notes="Each column is a separate regression on the labelled outcome.")
# ============================================================
# 8d. TABLE 4 — Heterogeneity (subgroup × main coef)
# ============================================================
het_specs = {
"All": df,
"Female=0": df[df.female==0],
"Female=1": df[df.female==1],
"Age<40": df[df.age<40],
"Age>=40": df[df.age>=40],
"Manufacturing": df[df.industry.eq("manufacturing")],
}
het_models = [pf.feols("log_wage ~ training + age + edu + tenure | worker_id + year",
data=d, vcov={"CRV1":"worker_id"})
for d in het_specs.values()]
pf.etable(het_models, type="tex", file="tables/table4_heterogeneity.tex",
headers=list(het_specs.keys()),
notes="Cluster-robust SE at worker_id. Wald p-values for cross-subgroup equality "
"should accompany this table — see references/07.")
# ============================================================
# 8e. TABLE 5 — Robustness battery (alt SE / cluster / sample / placebo)
# ============================================================
rob = {
"Baseline": pf.feols("log_wage ~ training | worker_id + year", df,
vcov={"CRV1":"worker_id"}),
"Cluster=firm": pf.feols("log_wage ~ training | worker_id + year", df,
vcov={"CRV1":"firm_id"}),
"Two-way clust": pf.feols("log_wage ~ training | worker_id + year", df,
vcov={"CRV3x1":["worker_id","firm_id"]}),
"Winsor 1/99": pf.feols("log_wage ~ training | worker_id + year",
df.assign(log_wage=df.log_wage.clip(*df.log_wage.quantile([.01,.99]))),
vcov={"CRV1":"worker_id"}),
"Drop manuf": pf.feols("log_wage ~ training | worker_id + year",
df[df.industry!="manufacturing"], vcov={"CRV1":"worker_id"}),
"Placebo (-3)": pf.feols("log_wage ~ fake_post | worker_id + year",
df, vcov={"CRV1":"worker_id"}),
}
pf.etable(list(rob.values()), type="tex", file="tables/table5_robustness.tex",
headers=list(rob.keys()))
# ============================================================
# 8f. ★ FIGURE 3 — Coefficient plot across M1→M6
# ============================================================
fig, ax = plt.subplots(figsize=(6, 3.5))
labels, betas, lows, highs = [], [], [], []
for name, r in [("(1)",m1),("(2)",m2),("(3)",m3),("(4)",m4),("(5)",m5),("(6)",m6)]:
b = r.coef()["training"]; se = r.se()["training"]
labels.append(name); betas.append(b)
lows.append(b-1.96*se); highs.append(b+1.96*se)
betas = np.array(betas); lows = np.array(lows); highs = np.array(highs)
ax.errorbar(labels, betas, yerr=[betas-lows, highs-betas], fmt="o", capsize=3, color="navy")
ax.axhline(0, ls="--", color="gray", alpha=.6)
ax.set_ylabel("ATT on log wage"); ax.set_xlabel("Specification")
plt.tight_layout()
plt.savefig("figures/fig3_coefplot.pdf"); plt.savefig("figures/fig3_coefplot.png", dpi=300)
# ============================================================
# 8g. FIGURE 2 — Event-study plot (dynamic DID, base period = -1)
# ============================================================
fig, ax = plt.subplots(figsize=(7, 4))
pf.iplot(es, ax=ax)
ax.axhline(0, ls="--", color="gray"); ax.axvline(-0.5, ls=":", color="gray")
ax.set_xlabel("Years relative to treatment"); ax.set_ylabel("Coefficient (ATT)")
plt.tight_layout()
plt.savefig("figures/fig2_event_study.pdf"); plt.savefig("figures/fig2_event_study.png", dpi=300)
# ============================================================
# 8h. FIGURE 4 — Sensitivity / robustness curve (spec curve)
# ============================================================
# Loop over 32 spec combinations and rank by point estimate
specs_curve = []
for fe in ["", "| worker_id", "| worker_id + year", "| worker_id + year + industry^year"]:
for ctrl in [[], ["age"], ["age","edu"], ["age","edu","tenure"]]:
f = "log_wage ~ training" + ("+" + "+".join(ctrl) if ctrl else "") + " " + fe
try:
r = pf.feols(f, df, vcov={"CRV1":"worker_id"})
specs_curve.append({"spec": f.strip(), "b": r.coef()["training"],
"se": r.se()["training"]})
except Exception:
pass
sc = pd.DataFrame(specs_curve).sort_values("b").reset_index(drop=True)
fig, ax = plt.subplots(figsize=(7, 4))
ax.errorbar(range(len(sc)), sc["b"], yerr=1.96*sc["se"], fmt="o", ms=3, color="navy", alpha=.7)
ax.axhline(0, ls="--", color="gray")
ax.set_xlabel("Specification rank (sorted by point estimate)")
ax.set_ylabel("Coefficient on training")
plt.tight_layout()
plt.savefig("figures/fig4_sensitivity.pdf"); plt.savefig("figures/fig4_sensitivity.png", dpi=300)
# ============================================================
# 8i. FIGURE 1 — Trend / motivation (treated vs control over time)
# ============================================================
# Already built in Step 3; re-export with paper-grade styling.
fig, ax = plt.subplots(figsize=(7, 4))
trend = df.groupby(["year","training"])["log_wage"].mean().unstack()
trend.plot(ax=ax, marker="o", color={0:"darkred", 1:"navy"})
ax.axvline(policy_year, ls="--", color="gray", label="Policy")
ax.set_ylabel("Mean log wage"); ax.set_xlabel("Year")
ax.legend(["Control","Treated","Policy"])
plt.tight_layout()
plt.savefig("figures/fig1_trend.pdf"); plt.savefig("figures/fig1_trend.png", dpi=300)
# ============================================================
# 8j. Auxiliary plots (optional — produce when relevant)
# ============================================================
from binsreg import binsreg # binscatter
binsreg(y=df["log_wage"], x=df["tenure"], w=df[["age","edu"]], nbins=20)
plt.savefig("figures/figA_binscatter.pdf")
from rdrobust import rdplot # RD plot (only when running_var exists)
# rdplot(y=df["outcome"], x=df["running_var"], c=0); plt.savefig("figures/figA_rdplot.pdf")
# Forest plot for subgroups → see references/08-tables-plots.md §8.6 for the full recipe.
Deliverables checklist (verify before declaring the run complete):
[ ] tables/table1_balance.tex [ ] figures/fig1_trend.pdf
[ ] tables/table2_main.tex ★ [ ] figures/fig2_event_study.pdf
[ ] tables/table3_mechanism.tex [ ] figures/fig3_coefplot.pdf
[ ] tables/table4_heterogeneity.tex
[ ] tables/table5_robustness.tex [ ] figures/fig4_sensitivity.pdf
[ ] tables/tableA1_robustness.tex [ ] figures/fig5_spec_curve.pdf
[ ] artifacts/sample_construction.json (footnote 4)
[ ] artifacts/data_contract.json
[ ] artifacts/result.json (reproducibility stamp — see 8l)
The single artifact a journal's replication office (or a future co-author) needs to reproduce the headline number. Persist Python version, seed, dataset hash, baseline coefficient + CI, and pointers to the protocol/contract:
import json, sys, hashlib, pyfixest
# Get baseline result (assumes `base` is the headline pf.feols/sm.OLS object)
b_hat = float(base.coef()["training"])
se = float(base.se()["training"])
lo, hi = b_hat - 1.96*se, b_hat + 1.96*se
dataset_sha = hashlib.sha256(
pd.util.hash_pandas_object(df, index=True).values.tobytes()
).hexdigest()[:16]
stamp = {
"python_version": sys.version,
"pyfixest_version": pyfixest.__version__,
"seed": 42,
"dataset_sha256_16": dataset_sha,
"n_obs": int(base._N),
"estimand": "ATT",
"estimator": "pf.feols",
"estimate": b_hat,
"se_cluster": se,
"ci95": [lo, hi],
"pre_registration": "artifacts/strategy.md",
"data_contract": "artifacts/data_contract.json",
"sample_log": "artifacts/sample_construction.json",
"paper_bundle": "tables/table2_main.tex",
}
with open("artifacts/result.json", "w") as f:
json.dump(stamp, f, indent=2)
Commit artifacts/result.json alongside the paper PDF. A referee should be able to run python master.py and bit-identically reproduce this JSON.
When the user's wording flags Mode A (target-trial emulation / IPTW / TMLE / MR / STROBE / 流行病学 / 公共健康 / RWE / cohort), the 8 steps still apply — but Step 5 swaps the OLS-and-FE stack for the doubly-robust + survival + MR triplet, and the deliverables follow STROBE / TRIPOD-AI conventions. Steps 1–4 (cleaning, construction, Table 1, diagnostics) and Step 8 (tables/figures export) are identical to the Default mode.
Library footprint (install on top of the Default stack):
pip install zepid # IPTW, g-formula, TMLE, AIPW, E-value
pip install lifelines # KM, Cox, AFT, RMST
pip install scikit-survival # alternative survival stack (scaling-friendly)
# Mendelian randomization — Python coverage is thin; for IVW/Egger/weighted-median:
pip install pymr # if available
# Or call R from Python:
pip install rpy2 # then import TwoSampleMR / MendelianRandomization via rpy2
Write the protocol before touching the data. Save it as protocol.yml and quote it in the paper.
# protocol.yml — target-trial emulation skeleton
target_trial = {
"eligibility": {"age": "40-75", "no_prior_event": True, "ascertained_at": "t0"},
"treatment": {"A=1": "statin initiation", "A=0": "no initiation"},
"assignment": "random at t0 (emulated by IPTW on baseline covariates)",
"follow_up_start":"t0 (treatment initiation date)",
"outcome": "incident MI within 5 years",
"estimand": "intention-to-treat ATE on risk difference + hazard ratio",
"censoring": {"loss_to_FU": True, "competing_risk": "death from non-MI causes"},
}
# Cohort construction in pandas — eligibility + index date + censoring date
cohort = (df
.query("age >= 40 & age <= 75 & prior_MI == 0") # eligibility
.assign(t0 = lambda d: d["statin_initiation_date"].fillna(d["enrollment_date"]),
event_5y = lambda d: ((d["MI_date"] - d["t0"]).dt.days <= 365*5).astype(int),
time_at_risk = lambda d: ((d["censor_date"] - d["t0"]).dt.days.clip(0, 365*5))))
Use the same tableone / gtsummary-style call from Step 3, just groupby="A" (treatment indicator). E-values for unmeasured confounding go in the footer.
# Estimate PS, plot overlap (positivity), love-plot SMDs
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
X = df[["age", "edu", "smoke", "bmi", "ldl", "sbp"]]
y = df["A"]
ps = LogisticRegression(max_iter=1000).fit(StandardScaler().fit_transform(X), y).predict_proba(StandardScaler().fit_transform(X))[:, 1]
df["ps"] = ps
# Overlap plot
import matplotlib.pyplot as plt, seaborn as sns
sns.kdeplot(data=df, x="ps", hue="A", common_norm=False)
plt.savefig("figures/figA2_ps_overlap.pdf")
# Love plot (SMDs before vs after IPTW) — see zepid.causal.ipw.diagnostics
The "AER Table 2" of epi: a 3-column table where each column is one of {IPTW-MSM, g-formula, TMLE} so the reader can confirm doubly-robust agreement.
import zepid as ze
from zepid.causal.ipw import IPTW
from zepid.causal.gformula import TimeFixedGFormula
from zepid.causal.doublyrobust import TMLE
# IPTW (marginal structural model)
iptw = IPTW(df, treatment="A", outcome="event_5y")
iptw.treatment_model("age + edu + smoke + bmi + ldl + sbp", print_results=False)
iptw.outcome_model("A", print_results=False)
iptw.fit()
RD_iptw, CI_iptw = iptw.risk_difference, iptw.risk_difference_ci
# g-formula (parametric)
gf = TimeFixedGFormula(df, exposure="A", outcome="event_5y")
gf.outcome_model("A + age + edu + smoke + bmi + ldl + sbp")
gf.fit(treatment="all") ; r1 = gf.marginal_outcome
gf.fit(treatment="none") ; r0 = gf.marginal_outcome
RD_gf = r1 - r0
# TMLE (doubly robust)
tmle = TMLE(df, exposure="A", outcome="event_5y")
tmle.exposure_model("age + edu + smoke + bmi + ldl + sbp")
tmle.outcome_model("A + age + edu + smoke + bmi + ldl + sbp")
tmle.fit()
RD_tmle, CI_tmle = tmle.risk_difference, tmle.risk_difference_ci
# Stack the triplet into one paper table
import pandas as pd
tableA3 = pd.DataFrame({
"Estimator": ["IPTW-MSM", "g-formula", "TMLE"],
"RD": [RD_iptw, RD_gf, RD_tmle],
"95% CI": [CI_iptw, "—", CI_tmle],
})
tableA3.to_latex("tables/tableA3_dr_triplet.tex", index=False, float_format="%.3f")
from lifelines import KaplanMeierFitter, CoxPHFitter, WeibullAFTFitter
from lifelines.utils import restricted_mean_survival_time
# KM by treatment
fig, ax = plt.subplots()
for a, sub in df.groupby("A"):
KaplanMeierFitter().fit(sub["time_at_risk"], sub["event_5y"], label=f"A={a}").plot_survival_function(ax=ax)
plt.savefig("figures/figA4_km.pdf")
# Cox HR (covariate-adjusted)
cox = CoxPHFitter().fit(df[["time_at_risk","event_5y","A","age","edu","smoke","bmi","ldl","sbp"]],
duration_col="time_at_risk", event_col="event_5y")
HR = cox.hazard_ratios_["A"]; HR_CI = cox.confidence_intervals_.loc["A"].values
# AFT (Weibull) for time-ratio interpretation
aft = WeibullAFTFitter().fit(df[["time_at_risk","event_5y","A","age","edu","smoke","bmi","ldl","sbp"]],
duration_col="time_at_risk", event_col="event_5y")
# RMST contrast at t=5y
rmst1 = restricted_mean_survival_time(df.query("A==1")["time_at_risk"], df.query("A==1")["event_5y"], t=5*365)
rmst0 = restricted_mean_survival_time(df.query("A==0")["time_at_risk"], df.query("A==0")["event_5y"], t=5*365)
Two-sample MR is most ergonomic via R; from Python use rpy2 or pre-export betas/SEs and call TwoSampleMR / MendelianRandomization in a sister R script.
# Pure-Python: pymr (or write the IVW/Egger formulas by hand — they're closed-form)
# Cross-language: rpy2 is the de-facto path for IVW/Egger/weighted median.
import rpy2.robjects as ro
ro.r('''
library(MendelianRandomization)
mr_input <- mr_input(bx=BX, bxse=BXSE, by=BY, byse=BYSE)
ivw <- mr_ivw(mr_input)
egger <- mr_egger(mr_input)
median <- mr_median(mr_input, weighting="weighted")
''')
# Stack the triplet (IVW, Egger, weighted-median) in one tableA5.
# E-value from zepid (Linden & VanderWeele formula)
from zepid.sensitivity_analysis import e_value
ev = e_value(measure="RR", est=1.45, lcl=1.10, ucl=1.91) # → required strength of unmeasured confounding
print(ev)
Save as replication/strobe_checklist.md and tick before submission:
[ ] Eligibility criteria + dates (target-trial protocol)
[ ] Adjustment set with DAG justification (A.2)
[ ] Positivity / overlap diagnostic (A.2)
[ ] Doubly-robust triplet (IPTW + g-formula + TMLE) (A.3)
[ ] Risk difference + hazard ratio + RMST (A.3, A.4)
[ ] E-value for unmeasured confounding (A.6)
[ ] Loss-to-follow-up rate + censoring assumption (A.0)
[ ] Pre-registered protocol or analysis plan (A.0)
When the user's wording flags Mode B (DML / meta-learner / causal forest / Dragonnet / BCF / CATE / policy learning / conformal causal / fairness / 因果机器学习), the pipeline keeps Steps 1–4 and Step 8 from the Default mode, swaps Step 5 for the ML estimator stack, and adds a CATE-distribution + policy-value layer between Step 7 and Step 8.
Library footprint (install on top of the Default stack):
pip install econml doubleml causalml dowhy # core estimators
pip install causal-learn cdt # causal discovery (PC / NOTEARS / GES)
pip install mapie # conformal prediction (incl. causal)
pip install fairlearn # fairness audit
pip install policytree-py # discrete policy learning (optional; also via econml.policy)
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor, GradientBoostingClassifier
train, holdout = train_test_split(df, test_size=0.3, random_state=42, stratify=df["A"])
# Standard nuisance pair: outcome regression Q(X,A) and propensity score g(A|X)
Q_learner = GradientBoostingRegressor(n_estimators=300, max_depth=3)
g_learner = GradientBoostingClassifier(n_estimators=300, max_depth=3)
# Causal discovery on observational data — start with PC algorithm
from causallearn.search.ConstraintBased.PC import pc
cg = pc(df[["A","Y","X1","X2","X3","X4"]].to_numpy())
cg.draw_pydot_graph(labels=["A","Y","X1","X2","X3","X4"])
# OR: NOTEARS for a fully differentiable DAG learner
# from cdt.causality.graph import NOTEARS
# notears = NOTEARS().predict(df[["A","Y","X1","X2","X3","X4"]])
# Always sanity-check the recovered DAG against domain knowledge before reading off the adjustment set.
The "AER Table 2" of ML causal: a horse-race table where each column is one estimator family on the same (Y, A, X) data — readers want to see DML, T-learner, X-learner, causal forest, and Dragonnet all agree (or disagree) on the ATE.
from econml.dml import LinearDML, NonParamDML
from econml.metalearners import SLearner, TLearner, XLearner, DomainAdaptationLearner
from econml.dr import DRLearner
from econml.grf import CausalForest
from causalml.inference.tf import DragonNet
# from bcf_py import BayesianCausalForest # if BCF needed
X_train, A_train, Y_train = train[["X1","X2","X3","X4"]], train["A"], train["Y"]
X_hold, A_hold, Y_hold = holdout[["X1","X2","X3","X4"]], holdout["A"], holdout["Y"]
# DML — semi-parametric, doubly robust to nuisance ML
dml = LinearDML(model_y=Q_learner, model_t=g_learner, discrete_treatment=True, cv=5).fit(Y_train, A_train, X=X_train, W=X_train)
ate_dml = dml.ate(X_hold); ci_dml = dml.ate_interval(X_hold, alpha=0.05)
# Meta-learners — drop the same Q/g into S/T/X
sl = SLearner(overall_model=Q_learner).fit(Y_train, A_train, X=X_train); ate_S = sl.ate(X_hold)
tl = TLearner(models=Q_learner).fit(Y_train, A_train, X=X_train); ate_T = tl.ate(X_hold)
xl = XLearner(models=Q_learner, propensity_model=g_learner).fit(Y_train, A_train, X=X_train); ate_X = xl.ate(X_hold)
# DR-Learner — orthogonal CATE
dr = DRLearner(model_propensity=g_learner, model_regression=Q_learner, model_final=GradientBoostingRegressor()).fit(Y_train, A_train, X=X_train)
ate_DR = dr.ate(X_hold)
# Causal forest — non-parametric CATE
cf = CausalForest(n_estimators=500).fit(X=X_train, T=A_train, y=Y_train); cate_cf = cf.predict(X_hold)
ate_cf = cate_cf.mean()
# Dragonnet — joint propensity + outcome neural net
dn = DragonNet(); dn.fit(X_train.values, A_train.values, Y_train.values); ate_dn = dn.predict(X_hold.values).mean()
# Stack the horse-race
import pandas as pd
tableB2 = pd.DataFrame({
"Estimator": ["DML (linear)", "S-learner", "T-learner", "X-learner", "DR-learner", "Causal Forest", "Dragonnet"],
"ATE": [ate_dml, ate_S, ate_T, ate_X, ate_DR, ate_cf, ate_dn],
})
tableB2.to_latex("tables/tableB2_ml_horserace.tex", index=False, float_format="%.4f")
# CATE histogram and quantile plot — heterogeneity from causal forest
import numpy as np
cate = cf.predict(X_hold)
plt.figure(); plt.hist(cate, bins=30); plt.axvline(0, ls="--", color="k"); plt.xlabel("CATE")
plt.savefig("figures/figB3_cate_hist.pdf")
# CATE by quartile of a covariate
df_h = X_hold.assign(cate=cate, age_q=pd.qcut(X_hold["X1"], 4, labels=False))
df_h.groupby("age_q")["cate"].mean().plot(kind="bar"); plt.ylabel("Mean CATE")
plt.savefig("figures/figB3_cate_by_age_q.pdf")
from econml.policy import DRPolicyTree # honest discrete policy tree
policy = DRPolicyTree(max_depth=3).fit(Y_train, A_train, X=X_train)
policy.plot() # prose-readable tree of "treat if X1<a and X2>b"
plt.savefig("figures/figB4_policy_tree.pdf")
# Off-policy evaluation (DR-style policy-value)
pred_policy = policy.predict(X_hold)
policy_value_DR = ((Y_hold * (pred_policy == A_hold).astype(int)).mean()
- (Y_hold * (pred_policy != A_hold).astype(int)).mean())
print(f"DR policy value (holdout): {policy_value_DR:.3f}")
from mapie.regression import MapieRegressor
# Conformal prediction interval around CATE — distribution-free coverage guarantee
mapie = MapieRegressor(estimator=GradientBoostingRegressor(), method="plus", cv=10).fit(X_train, cf.predict(X_train))
y_pred, y_pis = mapie.predict(X_hold, alpha=0.1) # 90% conformal PI
# Fairness audit — disparate-impact / equalized-odds for the learned policy
from fairlearn.metrics import MetricFrame, demographic_parity_difference, equalized_odds_difference
sens = X_hold["sensitive_attr"] # e.g. female
mf = MetricFrame(metrics={"acc": (lambda y,y_hat: (y == y_hat).mean())},
y_true=A_hold, y_pred=policy.predict(X_hold), sensitive_features=sens)
print("Demographic parity diff:", demographic_parity_difference(A_hold, policy.predict(X_hold), sensitive_features=sens))
Save as replication/ml_causal_checklist.md:
[ ] Nuisance learners listed (Q model, g model, hyperparameters, CV folds)
[ ] Cross-fitting / sample-splitting documented (DML K-fold)
[ ] Overlap / propensity diagnostics (B.0 + A.2-style overlap plot)
[ ] CATE summary (mean, SD, quartiles) + heterogeneity p-value
[ ] Policy value with confidence interval (B.4)
[ ] Conformal coverage rate on holdout (B.5)
[ ] Fairness gaps across sensitive attributes (B.5)
[ ] DAG / adjustment set + sensitivity to unmeasured confounding (E-value or Manski bounds)
| Step | Task | Go-to library | Fallback |
|------|------|---------------|----------|
| 1 | Data cleaning | pandas | polars for >10M rows |
| 1 | Missing-data viz | missingno | manual heatmap |
| 2 | Transformations | numpy, pandas, scipy.stats.mstats.winsorize | sklearn.preprocessing |
| 3 | Summary stats + Table 1 | pandas, scipy.stats | tableone |
| 3 | Plots | matplotlib, seaborn | plotly for interactive |
| 4 | Normality / hetero / autocorr | statsmodels.stats.*, scipy.stats | pingouin |
| 4 | Stationarity | statsmodels.tsa.stattools, arch | — |
| 5 | OLS / panel FE | pyfixest | statsmodels, linearmodels |
| 5 | IV | linearmodels | pyfixest (panel IV) |
| 5 | DID (2×2, event study, TWFE) | pyfixest (+ diff-diff for CS/SA/BJS/SDiD) | statsmodels with interactions |
| 5 | RD | rdrobust, rddensity | — |
| 5 | Synthetic Control | pysynth, sdid | manual scipy.optimize |
| 5 | Matching / IPW / EB | causalml, econml, ebal | pymatch |
| 5 | DML / Causal Forest | econml | doubleml |
| 6 | Specification curve | custom loop over formulas | specurve |
| 7 | Mediation | pingouin.mediation_analysis | manual Baron–Kenny |
| 7 | CATE | econml causal forest | — |
| 8 | Regression tables | stargazer, pyfixest.etable | summary_col (statsmodels) |
| 8 | Coefplot / event study | pyfixest.iplot, matplotlib.errorbar | forestplot |
| 8 | Binscatter | binsreg | manual quantile cut |
| Mistake | Correct approach |
|---------|------------------|
| Running OLS on panel data without any FE | Use pyfixest.feols(... \| unit + time, ...) |
| Default (iid) SEs on clustered / panel / few-cluster data | vcov={"CRV1":"cluster_var"}; wild bootstrap if clusters < 50 |
| TWFE with staggered adoption | Use CS / SA / BJS to avoid negative-weight bias |
| Dropping rows silently inside model fit | Do Step 1 cleaning explicitly; print counts |
| Blanket-mean-imputing covariates with high missingness | Consider MICE (statsmodels.imputation) or dropping the variable |
| Reporting only the coefficient, not the CI | Always report point + 95% CI + N; plot coefplot |
| One specification, no robustness | Ship progressive specs (M1–M6) + alternative SEs |
| Reporting only the headline coefficient (no Table 2) | Always ship the multi-column M1→M6 main table — that is the centerpiece of an economics paper, not the abstract sentence |
| Coefficient table without any figures | An economics result needs at least F1 trend + F2 event study + F3 coefplot + F4 sensitivity — see the Default Output Spec |
| RD with a single bandwidth | Show bandwidth sensitivity + rdbwselect |
| IV without first-stage F | Always print .first_stage (F > 10, ideally > 104) |
| PSM without SMD balance table | Report pre/post SMDs, target |SMD| < 0.1 |
| Interpreting mediation without Imai sensitivity | For causal mediation, never stop at Baron–Kenny in a serious paper |
| "Table 1 in Excel after pipeline finishes" | Build Table 1 in Step 3 before any regression |
| p-hacking via specification selection | Disclose the full specification curve, not the favorite cell |
# 1. Clean
df = load_and_clean(raw_path) # Step 1
# 2. Transform
df = construct_vars(df) # Step 2
# 3. Describe
table1 = build_table1(df, by="training") # Step 3
save_corr_heatmap(df, cols) # Step 3
# 4. Diagnose
run_diagnostics(df, y="log_wage", x=covariates) # Step 4
# 5. Model
results_main = fit_baseline(df) # Step 5
# 6. Robustify
results_robust = robustness_battery(df) # Step 6
# 7. Extend
results_het = heterogeneity_analysis(df) # Step 7
results_mech = mechanism_analysis(df) # Step 7
# 8. Export
export_latex_tables(results_main, results_robust, results_het) # Step 8
save_all_figures() # Step 8
The deliverable for every run is the economics empirical-paper output set defined in the Default Output Spec:
tables/table1_balance.{tex,docx}, tables/table2_main.{tex,docx} ★, tables/table3_mechanism.tex, tables/table4_heterogeneity.tex, tables/table5_robustness.texfigures/fig1_trend.{pdf,png}, figures/fig2_event_study.{pdf,png}, figures/fig3_coefplot.{pdf,png}, figures/fig4_sensitivity.{pdf,png}If any deliverable is missing or skipped, print why (e.g. "F2 event study omitted: design is cross-sectional"). Do not silently drop.
pf.etable(*models, ...) and Stargazer([...]) are the two primitives behind every multi-regression table. The eight patterns above map to:
| Pattern | What varies across columns | Step |
|---|---|---|
| A. Progressive controls | covariate set / FE depth | 5.A — Table 2 |
| B. Design horse race | identification strategy (OLS / IV / DID / DML / PSM) | 5.B — Table 2-bis |
| C. Multi-outcome | dependent variable Y | 5.C — Table 2-ter |
| D. Stacked Panel A / B | horizon / sample (panel rows × spec columns) | 5.D — Table 2-quater |
| E. IV reporting triplet | first stage / reduced form / 2SLS | 5.E — Table 2-quinto |
| F. Causal-orchestrator | 1 column, full diagnostics (CausalForestDML / att_gt / synth) | 5.F |
| G. Subgroup table | subsample (full / female / male / Q1…Q4) | 7 — Table 3 |
| H. Robustness master | every robustness check stacked | 6.g — Table A1 |
Default pf.etable settings for AER house style:
pf.etable(models,
type="tex", file="tables/tableN.tex",
headers=["(1)", "(2)", ...], # column labels
digits=3, signif_code=[0.1, 0.05, 0.01],
notes=("Cluster-robust SE in parentheses. "
"* p<0.10, ** p<0.05, *** p<0.01."))
# Word version: same call with type="docx".
# For multi-source tables (mixing pyfixest + statsmodels + linearmodels), use Stargazer.
| # | Figure | Python commands | Section |
|---|---|---|---|
| 1a | Raw trends (DID Figure 1) | df.groupby([time,treat])[y].mean().unstack().plot() | §1 |
| 1b | Treatment rollout heatmap | seaborn.heatmap(rollout_pivot) | §1 |
| 2a | Event-study coefficients | pf.iplot(pf.feols("y ~ sunab(G,t) | i+t")) | §3 (Step 3.5.1) |
| 2a' | Bacon weights | bacondecomp.bacon(...) (Python port) or rpy2 callout | §3 |
| 2a'' | CS-DID dynamic effects | rpy2 → did::att_gt/ggdid | §3 |
| 2b | First-stage scatter | binsreg(y=D, x=Z, w=X) | §3 (Step 3.5.2) |
| 2c | RD canonical plot | rdrobust.rdplot(y, x, c=0) | §3 (Step 3.5.3) |
| 2c' | McCrary density | rddensity.rdplotdensity(...) | §3 |
| 2d | Matching love plot | causalml/MatchIt SMD pre vs post | §3 (Step 3.5.4) |
| 2e | SCM trajectory | pysynth · SyntheticControlMethods.Synth | §3 (Step 3.5.5) |
| 3 | Coefficient plot of main specs | pf.coefplot(models, coefs=["training"]) | §4 |
| 4a | Dose-response | econml.dml.LinearDML(...).effect() + matplotlib | §5 |
| 4b | CATE distribution | CausalForestDML(...).effect() + plt.hist | §5 |
| 5 | Specification curve | itertools.product + matplotlib.errorbar (see 6.h) | §7 |
| 6 | Sensitivity dashboard | HonestDiD + Oster δ + E-value table | §7 (Step 6.i) |
| 7 | Final main figure | estimator-specific (rdplot, iplot, Synth.plot) | §8 |
Every figure is exported as both
.png ≥ 300 dpi(for slides / web). Useplt.style.use("default")+seaborn.set_context("paper")once at the top ofmaster.pyfor consistent styling.
import statsmodels.formula.api as smf
import pyfixest as pf
from linearmodels.panel import PanelOLS, RandomEffects, FirstDifferenceOLS
smf.ols ("y ~ X", df).fit(cov_type="cluster", cov_kwds={"groups": df["i"]}) # OLS
pf.feols ("y ~ X | fe1", df, vcov={"CRV1": "i"}) # OLS + 1 FE
pf.feols ("y ~ X | fe1 + fe2", df, vcov={"CRV1": "i"}) # HD FE workhorse
pf.feols ("y ~ X | fe1 + fe2", df, vcov={"CRV1": "fe1+fe2"}) # 2-way cluster
pf.fepois ("count ~ X | fe1 + fe2", df, vcov={"CRV1": "i"}) # Poisson + FE
pf.feglm ("y ~ X | fe1", df, family="logit", vcov={"CRV1": "i"}) # Logit + FE
PanelOLS.from_formula("y ~ X + EntityEffects + TimeEffects", df).fit(cov_type="clustered")
RandomEffects.from_formula("y ~ X", df).fit() # RE (Hausman)
# 2×2 DID
pf.feols("y ~ i(treated, post, ref=0) | i + t", df, vcov={"CRV1":"i"})
# Staggered — modern stack (Sun-Abraham via pyfixest::sunab)
pf.feols("y ~ sunab(first_treat, year) | i + year", df, vcov={"CRV1":"i"})
# Callaway-Sant'Anna — R callout via rpy2
import rpy2.robjects as ro; from rpy2.robjects import pandas2ri; pandas2ri.activate()
ro.r('library(did); cs <- att_gt(yname="y", tname="t", idname="i", gname="G", data=df)')
# Borusyak-Jaravel-Spiess — also R callout (didimputation) or Python `did_imputation` if installed
# Synthetic DID — R callout to `synthdid`
from linearmodels.iv import IV2SLS, IVLIML, IVGMM
IV2SLS.from_formula("y ~ 1 + X + [D ~ Z]", df).fit(cov_type="clustered", clusters=df["firm_id"])
# Or via pyfixest:
pf.feols("y ~ X | D ~ Z", df, vcov={"CRV1":"firm_id"})
# Weak-IV diagnostics: ivreg2 has CD/KP, linearmodels reports first_stage().diagnostics
from rdrobust import rdrobust, rdplot, rdbwselect
from rddensity import rddensity
rdrobust(y=df["y"], x=df["x"], c=0, kernel="triangular", bwselect="mserd") # Sharp RD
rdrobust(y=df["y"], x=df["x"], c=0, fuzzy=df["D"]) # Fuzzy RD
rddensity(X=df["x"], c=0) # McCrary
rdplot (y=df["y"], x=df["x"], c=0)
import causalml.match as cm
from causalml.match import NearestNeighborMatch
NearestNeighborMatch(replace=False, ratio=1).match(df, treatment_col="D", score_cols=["X1","X2"])
# Entropy balancing — econml or hand-rolled via scipy.optimize
from SyntheticControlMethods import Synth
Synth(df, "y", "unit", "time", treatment_period=2015, treated_unit=1)
# SDiD via rpy2 callout to R::synthdid
from econml.dml import LinearDML, CausalForestDML, NonParamDML
from econml.dr import DRLearner
from econml.metalearners import SLearner, TLearner, XLearner
from econml.iv.dml import OrthoIV
# Full pipeline in §B.
# Two-way cluster: vcov={"CRV1":"firm_id+year"} in pyfixest
# Wild cluster bootstrap: rpy2 → R::fwildclusterboot, or pyfixest's wildboottest()
fit.wildboottest(param="training", B=9999, seed=42)
# Randomization inference: rpy2 → R::ri2 or hand-roll via permutation
# Romano-Wolf: rpy2 → R::wyoung; or hand-roll via stepdown
# Oster δ: see references/06-robustness.md §6.6
# HonestDiD: rpy2 → R::HonestDiD
# E-value: hand-rolled VanderWeele-Ding formula (see 6.i)
from lifelines import KaplanMeierFitter, CoxPHFitter, WeibullAFTFitter
KaplanMeierFitter().fit(df["time"], df["event"]).plot() # KM
CoxPHFitter().fit(df, duration_col="time", event_col="event") # Cox
WeibullAFTFitter().fit(df, duration_col="time", event_col="event") # AFT
# RMST: lifelines.utils.restricted_mean_survival_time
import zepid
zepid.causal.gformula.SurvivalGFormula(...) # g-formula
zepid.causal.ipw.IPTW(...) # IPTW
zepid.sensitivity_analyses.e_value(...) # E-value
import statspai as sp) → 00-StatsPAI_skill.10-Jill0099-causal-inference-mixtape.20-wenddymacro-python-econ-skill.feols kwarg) → 40-py-econometrics-pyfixest.04-*-scientific-writer, 08-*-web-latex, etc.).This skill's remit ends at Step 8 — polished tables and figures. Paper drafting is out of scope.
development
Conduct rigorous thematic analysis (TA) of qualitative data following Braun and Clarke's (2006) six-phase framework. Use whenever the user mentions 'thematic analysis', 'TA', 'Braun and Clarke', 'qualitative coding', 'identifying themes', or asks for help analysing interviews, focus groups, open-ended survey responses, or transcripts to identify patterns. Also trigger for questions about inductive vs theoretical coding, semantic vs latent themes, essentialist vs constructionist epistemology, building a thematic map, or writing up a qualitative findings section. Covers all six phases, the four upfront analytic decisions, the 15-point quality checklist, and the five common pitfalls. Produces a Word document write-up and an annotated thematic map. Does NOT cover IPA, grounded theory, discourse analysis, conversation analysis, or narrative analysis — use a different method for those.
development
Guide users through writing a systematic literature review (SLR) following the PRISMA 2020 framework. Use this skill whenever the user mentions 'systematic review', 'systematic literature review', 'SLR', 'PRISMA', 'PRISMA 2020', 'PRISMA flow diagram', 'PRISMA checklist', or asks for help writing, structuring, or auditing a literature review that follows reporting guidelines. Also trigger when the user asks about inclusion/exclusion criteria for a review, search strategies for databases like Scopus/WoS/PubMed, study selection processes, risk of bias assessment, or narrative synthesis for a review paper. This skill covers the full PRISMA 2020 checklist (27 items), produces a Word document manuscript in strict journal article format, generates an annotated PRISMA flow diagram, and enforces APA 7th Edition referencing throughout. It does NOT cover meta-analysis or statistical pooling. By Chuah Kee Man.
testing
Performs placebo-in-time sensitivity analysis with hierarchical null model and optional Bayesian assurance. Use when checking model robustness, verifying lack of pre-intervention effects, or estimating study power.
data-ai
Fit, summarize, plot, and interpret a chosen CausalPy experiment. Use after the causal method has been selected, including when configuring PyMC/sklearn models and scale-aware custom priors.