skills/00.3-Full-empirical-analysis-skill_R/SKILL.md
Classical end-to-end empirical analysis workflow in the modern tidyverse + econometrics R ecosystem — dplyr + tidyr + haven + fixest + sandwich + lmtest + clubSandwich + AER + ivreg + did + bacondecomp + HonestDiD + eventstudyr + rdrobust + rddensity + Synth + gsynth + synthdid + MatchIt + WeightIt + cobalt + ebal + grf + DoubleML + mediation + marginaleffects + modelsummary + kableExtra + gt + ggplot2 + ggpubr + cowplot + binsreg. **Defaults to economics empirical-paper style** (AER / QJE / AEJ) — every run produces a publication-ready output set with a multi-column regression table (M1→M6 progressive controls/FE) as the centerpiece, plus Table 1 (descriptives), mechanism / heterogeneity / robustness tables, and event-study + coefficient + trend figures. Covers the full 8-step R pipeline an applied economist runs on every paper — (1) data import & cleaning (read_dta/read_csv, naniar, janitor, validate-merges), (2) variable construction (mutate/across/winsorize/group_by + lag/lead with dplyr), (3) descriptive statistics & Table 1 (gtsummary, modelsummary::datasummary, tableone), (4) classical diagnostic tests (shapiro/jarque.bera.test/bptest/dwtest/bgtest/vif/adf.test/kpss.test/Hausman), (5) baseline modeling (fixest::feols, ivreg, did::att_gt, eventstudyr, sun_ab, did_imputation, synthdid, rdrobust, MatchIt, WeightIt, grf::causal_forest, DoubleML, mediation), (6) robustness battery (modelsummary stack, clubSandwich CRSE, fwildclusterboot, ri2, robomit Oster, bacondecomp, HonestDiD), (7) further analysis (interactions + marginaleffects, mediation::mediate, gsem via lavaan, dose-response splines, grf CATE), (8) publication-ready tables & figures (modelsummary, kableExtra, gt, stargazer, texreg, flextable to LaTeX/Word/HTML; ggplot2 + ggpubr + cowplot + binsreg + iplot for figures). **Also covers two parallel domain modes that share the same 8-step scaffolding** — **Mode A — Epidemiology / public health** (target-trial emulation, IPTW + g-formula + TMLE doubly-robust triplet via `WeightIt` / `gfoRmula` / `tmle` / `ltmle`, Mendelian randomization via `MendelianRandomization` / `TwoSampleMR` / `MRPRESSO`, KM / Cox / AFT / RMST survival via `survival` / `survminer` / `flexsurv`, E-value sensitivity via `EValue`, principal stratification — STROBE / TRIPOD reporting), and **Mode B — ML causal inference** (DML via `DoubleML`, S/T/X/R/DR meta-learners via `causalweight` / `grf`, causal forest via `grf::causal_forest`, BART/BCF via `bartCause` / `bcf`, matrix completion via `MCPanel`, CATE distribution + policy tree via `policytree`, off-policy evaluation, conformal causal via `conformalInference` / `cfcausal`, fairness audit via `fairmodels`, DAG learning via `pcalg` / `bnlearn` / LLM-assisted). Use when the user asks for a complete R empirical analysis, wants a tidyverse-style reproducible R script / Quarto workflow, prefers fixest over reghdfe, needs the R counterpart to StatsPAI / 00.1 / 00.2, or names a specific R step in isolation ("feols with cluster", "MatchIt nearest neighbor", "bacondecomp in R", "gtsummary table 1", "modelsummary to Word"). Mode A triggers on "target trial emulation R", "tmle ltmle", "MendelianRandomization", "TwoSampleMR", "MRPRESSO", "survival cox AFT", "STROBE R", "EValue R", "公共健康 R", "流行病学 R". Mode B triggers on "DoubleML R", "grf causal forest", "policytree", "bartCause bcf", "conformal causal R", "fairmodels", "pcalg NOTEARS", "因果机器学习 R".
npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research Full-empirical-analysis-skill-RInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill is the canonical 8-step pipeline an applied economist runs on every empirical paper, written in the modern tidyverse + econometrics R ecosystem — dplyr/tidyr/haven for data, fixest as the panel/IV/DID workhorse, did/bacondecomp/HonestDiD for modern DID, rdrobust/rddensity for RD, Synth/gsynth/synthdid for synthetic control, MatchIt/WeightIt/cobalt/ebal for matching, grf/DoubleML for ML causal, mediation for causal mediation, marginaleffects for post-estimation, modelsummary/kableExtra/gt for publication tables, ggplot2/iplot/binsreg for figures.
Companion skills: this is the R sibling of 00-StatsPAI_skill (Python DSL), 00.1-Full-empirical-analysis-skill (explicit Python), and 00.2-Full-empirical-analysis-skill_Stata (Stata .do). All four implement the same 8 steps, in their respective ecosystems.
feols(... | unit + year, cluster = ~unit), not Frankenstein-y lm(y ~ x + factor(unit) + factor(year)).renv for package locking; Quarto (.qmd) for combined narrative + code + tables/figures.SKILL.md gives the canonical call per step; references/ holds variant-specific depth.The default playbook above is AER-style applied econometrics — the AEA convention: written-out estimating equation, identifying assumption, design horse-race, full robustness gauntlet. The skill also ships two parallel sub-pipelines for the other two big causal-inference traditions, each reusing the same Steps 1–4 (cleaning / construction / Table 1 / diagnostics) and Step 8 (tables/figures) — only Step 5 (estimator) and Step 6/7 swap packages:
| Mode | Reader convention | Step-5 estimator stack | Reporting stack | Jump to |
|---|---|---|---|---|
| Default — Applied Econ (AER / QJE / AEJ) | "Show the equation + identifying assumption + design horse-race; controls visible; clustered SE" | DID / IV / RD / SCM / matching / fixest::feols HDFE | AER house-style multi-column modelsummary + kableExtra / gt / flextable + 8-section paper layout | Steps 1 → 8 (entire playbook below) |
| Mode A — Epidemiology / Public Health | "STROBE / TRIPOD-AI; target trial protocol; doubly-robust estimand; absolute & relative risk; KM survival" | Target-trial emulation · IPTW (WeightIt / PSweight) · g-formula (gfoRmula) · TMLE (tmle / ltmle) · Mendelian randomization (MendelianRandomization / TwoSampleMR / MRPRESSO) · KM / Cox / AFT (survival / survminer / flexsurv) | Same modelsummary + risk-difference / hazard-ratio / E-value rows | §A. Epidemiology pipeline |
| Mode B — ML Causal Inference | "DML / meta-learners / causal forest / DR-learner; CATE distribution; policy value" | DML (DoubleML) · S/T/X/R/DR-Learner (causalweight / grf) · GRF causal forest (grf::causal_forest) · BART/BCF (bartCause / bcf) · matrix completion (MCPanel) | modelsummary ML horse-race + grf CATE plot + policy-value table + conformalInference PI | §B. ML causal pipeline |
How to invoke a non-default mode (Claude / agent picks this up from the user's wording):
| User says... | Mode the skill switches to |
|---|---|
| "Run a DID / IV / RD / event study", "AER table", "applied micro" | Default (AER econ) — Steps 1 → 8 |
| "Target trial emulation", "g-formula", "IPTW", "TMLE", "Mendelian randomization", "STROBE / TRIPOD", "公共健康 / 流行病学", "epi pipeline", "RWE study", "cohort study", "case-control" | Mode A (Epi) — §A |
| "DML", "double machine learning", "causal forest", "meta-learner", "CATE", "BCF", "policytree", "policy learning", "conformal causal", "fairness audit", "ML causal", "uplift modeling", "因果机器学习" | Mode B (ML causal) — §B |
| "Mix" (e.g. "estimate DID + then ML CATE on the heterogeneity") | Default + Mode B in sequence — every estimator yields a coefficient + SE pair, drop them all into one modelsummary(...) for the horse-race column |
The three modes share the same Step 1–4 cleaning / Table 1 / diagnostics scaffolding, the same Step 8 export stack, and the same DAG-first identification logic — switching modes only changes which Step-5 estimator family you reach for, not the surrounding paper structure. If you only want descriptive stats / Table 1 / a balance check, the AER gtsummary::tbl_summary / modelsummary::datasummary_balance calls in Step 3 work identically across all three modes.
This skill defaults to the applied-economics paper convention. Unless the user explicitly asks for a single point estimate, every run produces the full publication-ready output set below. Treat it as the contract of Step 8 — mandatory, not opt-in.
| # | Table | R source | Saves to |
|---|---|---|---|
| T1 | Summary statistics & balance (treated vs control, with SMD / p-values) | gtsummary::tbl_summary + add_p + add_difference (Step 3) | tables/table1_balance.xlsx + .docx + .tex |
| T2 ★ | Main results — multi-column regression M1→M6 (progressive controls + FE) | fixest::feols × 6 specs → modelsummary (Step 5–6) | tables/table2_main.xlsx + .docx + .tex |
| T3 | Mechanism / outcome ladder — same treatment, 3+ outcomes side-by-side | loop feols over y ∈ {Y1, Y2, Y3, Y_main} → modelsummary (Step 7) | tables/table3_mechanism.xlsx + .docx + .tex |
| T4 | Heterogeneity — subgroup × main coef (gender, age, region, …) | subgroup feols × linearHypothesis → modelsummary (Step 7) | tables/table4_heterogeneity.xlsx + .docx + .tex |
| T5 | Robustness battery — alt SE / cluster / sample / placebo, in one table | feols × variants → modelsummary (Step 6) | tables/table5_robustness.xlsx + .docx + .tex |
★ Table 2 is the centerpiece of every economics paper. It is the multi-column regression table that walks the reader from raw correlation (M1) to the fully-specified design (M6: 2-way FE + interacted FE + cluster-robust SE). Do not collapse it into a single column. Do not report only the headline coefficient. The progression is the credibility argument: if M1→M6 is monotone and stable, the design is plausibly identifying; if it collapses on adding FE, that is the result.
Canonical 6 columns, in order:
- M1 raw bivariate (
feols(y ~ treat, data))- M2 + demographics (
+ age + edu)- M3 + sector controls (
+ tenure / firm_size)- M4 + unit FE (
| worker_id)- M5 + 2-way FE (
| worker_id + year)- M6 + interacted FE (
| worker_id + year + industry^year) withcluster = ~ worker_id
| # | Figure | R source | Saves to |
|---|---|---|---|
| F1 | Trend / motivation — treated vs control over time, with policy line | dplyr group means → ggplot + geom_line (Step 3) | figures/fig1_trend.png (300 dpi, 必须导出 PNG) + .pdf |
| F2 | Event-study coefficients with 95% CI, base period at –1 | fixest::sunab() / did::ggdid / iplot (Step 5) | figures/fig2_event_study.png (300 dpi, 必须导出 PNG) + .pdf |
| F3 | Coefficient plot across specs M1→M6 | modelsummary::modelplot() (Step 8) | figures/fig3_coefplot.png (300 dpi, 必须导出 PNG) + .pdf |
| F4 | Robustness / sensitivity — bacondecomp::bacon plot, HonestDiD::createSensitivityPlot, or spec curve | scenario-specific (Step 6) | figures/fig4_sensitivity.png (300 dpi, 必须导出 PNG) + .pdf |
project/
├── tables/ table1_balance.xlsx/.docx/.tex table2_main.xlsx/.docx/.tex
│ table3_mechanism.xlsx/.docx/.tex table4_heterogeneity.xlsx/.docx/.tex
│ table5_robustness.xlsx/.docx/.tex
└── figures/ fig1_trend.png(300dpi)+.pdf fig2_event_study.png(300dpi)+.pdf
fig3_coefplot.png(300dpi)+.pdf fig4_sensitivity.png(300dpi)+.pdf
关键输出规则(必须遵守):
message() explaining why; do not silently drop.Synth / synthdid) — replace F1/F2 with the SCM trajectory + placebo distribution; T1–T5 still apply.# Run once on a fresh R install:
install.packages(c(
# Data
"tidyverse", "haven", "readxl", "data.table", "janitor",
"naniar", "VIM", "mice", "validate",
# Description / tables
"gtsummary", "tableone", "modelsummary", "kableExtra", "gt",
"stargazer", "texreg", "flextable", "psych", "summarytools",
# Tests
"lmtest", "sandwich", "car", "tseries", "urca", "plm",
"clubSandwich", "fwildclusterboot",
# Modeling — workhorses
"fixest", # panel/IV/DID with HD FE — primary
"AER", # ivreg
"ivreg", # alternative IV
# Modern DID
"did", # Callaway–Sant'Anna
"didimputation", # Borusyak–Jaravel–Spiess
"fixest", # sunab() for Sun–Abraham
"synthdid", # Synthetic DID
"bacondecomp", "HonestDiD",
"DIDmultiplegtDYN", # de Chaisemartin–D'Haultfœuille
# RD
"rdrobust", "rddensity", "rdmulti",
# Synthetic control
"Synth", "gsynth", "tidysynth",
# Matching / weighting
"MatchIt", "WeightIt", "cobalt", "ebal",
# ML causal
"grf", "DoubleML",
# Mediation / SEM
"mediation", "lavaan",
# Robustness / inference
"robomit", # Oster delta
"ri2", "ritools", # randomization inference
"multcomp",
# Margins / post-estimation
"marginaleffects",
# Plotting
"ggplot2", "ggpubr", "cowplot", "patchwork",
"binsreg",
"ggdist", "ggrepel"
))
# fixest's iplot, esttex, etable are bundled.
┌──────────────────────────────────────────────────────────────────────┐
│ Step −1 Pre-Analysis Plan (PAP) pwr / WebPower / DeclareDesign │
│ Step 0 Sample log + data contract sample_log/stopifnot/jsonlite │
│ Step 1 Data import & cleaning read_csv/read_dta/janitor/naniar/mice│
│ Step 2 Variable construction mutate/across/winsorize/lag/group_by │
│ Step 2.5 Empirical strategy equation × ID assumption + pre-reg │
│ Step 3 Descriptive statistics gtsummary/datasummary_balance/cor_pmat│
│ Step 3.5 Identification graphics iplot/binsreg/rdplot/cobalt/Synth │
│ Step 4 Diagnostic tests shapiro/bptest/dwtest/vif/adf/kpss │
│ Step 5 Baseline modeling feols/ivreg/att_gt/synthdid/MatchIt │
│ Step 6 Robustness battery bacondecomp/HonestDiD/fwildclusterboot│
│ Step 7 Further analysis marginaleffects/mediation/grf │
│ Step 8 Tables & figures modelsummary/iplot/ggplot2/cowplot │
└──────────────────────────────────────────────────────────────────────┘
The 8 steps mirror the canonical sections of an applied AER / QJE / AEJ paper. Each step is one paper section and emits a paper-ready artifact on disk:
Paper section Step R moves
─────────────────────────── ───── ────────────────────────────────────────────────
Pre-Analysis Plan −1 pwr / WebPower / DeclareDesign + freeze pap.json
§1. Data 0 sample_log + 5-check stopifnot → JSON via jsonlite
§1. Data 1 haven::read_dta · janitor::clean_names · naniar/mice
§1. Data 2 mutate/across/Winsorize/lag/lead/diff · CPI deflate
§1.1 Descriptives (Table 1) 3 gtsummary::tbl_summary · datasummary_balance
§2. Empirical Strategy 2.5 write equation + ID assumption → strategy.md
§3. Identification graphics 3.5 fixest::iplot · binsreg · rdplot · cobalt::love.plot · Synth
§3.5 Diagnostics 4 bptest · dwtest · car::vif · urca::ur.df · phtest
§4. Main Results (Table 2) 5 fixest::feols progressive (m1...m6) · modelsummary
§5. Heterogeneity (Table 3) 7 feols(... + i(.):X) · marginaleffects::avg_slopes
§6. Mechanisms / Channels 7 mediation::mediate · lavaan · outcome ladder
§7. Robustness gauntlet 6 bacondecomp · HonestDiD · robomit · fwildclusterboot · ri2
§8. Replication package 8 modelsummary("...tex") · gt → docx · result.json
Below is the canonical call at each step. All examples share one running narrative — labor-econ panel where training (treatment) affects log_wage (outcome), with covariates age, edu, tenure, panel keys worker_id/firm_id/year. Variable names and parameter values are illustrative.
When a step has many variants (5 staggered-DID estimators; 4 hetero tests), SKILL.md shows the one you reach for first; deeper variants live in
references/NN-<topic>.md.
A modern AER paper has 5–7 figures and 3–5 main tables + an appendix robustness table. Every step below leaves at least one numbered artifact on disk. Default file names assume parallel .tex / .docx / .xlsx exports (the agent should produce all three so co-authors can edit in Word, the build system can use LaTeX, and editors can edit raw numbers in Excel). 所有图片必须同时保存 PNG(≥300 dpi)和 PDF 两种格式。
| § | Artifact | R primitive | Filenames |
|---|---|---|---|
| §1 | Figure 1: raw trends / treatment rollout | df %>% group_by(year, treat) %>% summarise(mean(y)) %>% ggplot() | figures/fig1_trend.png(300dpi)+.pdf |
| §1 | Table 1: summary stats (full / treated / control + Δ + SMD) | gtsummary::tbl_summary · modelsummary::datasummary_balance | tables/table1_balance.xlsx/.docx/.tex |
| §3 | Figure 2: identification graphic (event-study / first-stage / McCrary / RD scatter / SCM trajectory) | fixest::iplot(es) · binsreg · rdrobust::rdplot · rddensity · Synth::path.plot | figures/fig2_event_study.png(300dpi)+.pdf |
| §4 | Table 2: main results — progressive controls M1→M6 | modelsummary(list("(1)"=m1,...,"(6)"=m6)) · fixest::etable | tables/table2_main.xlsx/.docx/.tex |
| §4 | Table 2-bis: design horse-race (OLS / IV / DID / DML) | modelsummary(list("OLS"=ols, "2SLS"=iv, "CS-DID"=cs, "DML"=dml)) | tables/table2b_designs.xlsx/.docx/.tex |
| §4 | Figure 3: coefficient plot across specs | modelplot(list(m1,...,m6), coef_map="training") | figures/fig3_coefplot.png(300dpi)+.pdf |
| §5 | Table 3: heterogeneity by subgroup | modelsummary(g_full, g_male, g_fem, g_q1, ..., g_q4) | tables/table3_heterogeneity.xlsx/.docx/.tex |
| §5 | Figure 4: dose-response / CATE | marginaleffects::plot_predictions · grf::plot.causal_forest | figures/fig4_cate.png(300dpi)+.pdf |
| §6 | Table 4: mechanism / outcome ladder | loop feols over outcomes → modelsummary | tables/table4_mechanism.xlsx/.docx/.tex |
| §7 | Table A1: robustness master (one column per check) | modelsummary(list(base, no99, balpan, dropearly, wfe, cl2way, logy, ihsy, psm, ebal)) | tables/tableA1_robustness.xlsx/.docx/.tex |
| §7 | Figure 5: spec curve | specr::specr() + plot_specs (or hand-rolled purrr::pmap) | figures/fig5_spec_curve.png(300dpi)+.pdf |
| §7 | Figure 6: sensitivity (HonestDiD / Oster / E-value) | HonestDiD::createSensitivityPlot · robomit::o_test · EValue | figures/fig6_sensitivity.png(300dpi)+.pdf |
| §8 | Replication bundle: all tables in one document | modelsummary(..., output="docx") · gt::gtsave() · Quarto / Rmd | replication/paper_tables.xlsx/.docx/.tex |
Every R estimator above (
fixest::feols/AER::ivreg/did::att_gt/grf::causal_forest/synthdid_estimate) returns a result object that can be passed straight intomodelsummary(...)/modelplot(...)/etable(...). Don't hand-roll LaTeX fromkable(), and don't render Word viaflextabledirectly —modelsummary,etable, andgtsummaryapply book-tab borders, AER stars, and the right SE label automatically. For deeper export recipes, seereferences/08-tables-plots.md.
关键规则(必须遵守):每个表格必须同时导出三种格式——Excel(.xlsx)、Word(.docx)、LaTeX(.tex)。每个图片必须同时保存PNG(≥300dpi)和PDF两种格式。
R has the best publication-table ecosystem of the three languages. Three tiers, picked by scope:
| Tier | Use when | API | Hot args |
|---|---|---|---|
| 1. Single multi-column table | Exporting one Table 2 / Table 3 / Table A1 with progressive columns | modelsummary(list("(1)"=m1,...,"(N)"=mN), output="tables/tab.tex", stars=c("*"=.1,"**"=.05,"***"=.01), gof_omit="BIC|AIC|F|Log", coef_map=c(...)) — or fixest::etable(m1,...,mN, file="tab.tex") for feols-only | coef_map=, gof_omit=, stars=, notes=, output= "latex"/"docx"/"xlsx"/"markdown", add_rows= |
| 2. Multi-panel paper format (Tables 2 + 3 + A1 + A2 in one file) | Producing the paper-tables block — main + heterogeneity + robustness + placebo as a single document | modelsummary chained with gt::gt_group() for one document with section headers, OR Quarto .qmd rendering multiple modelsummary calls between prose | gt_group(modelsummary(...), modelsummary(...)) · quarto render paper.qmd |
| 3. Full session bundle (the Stata collect / Python Stargazer + pylatex equivalent) | Replication appendix that mixes summary stats + balance + multiple regression tables + headings + prose in one file | Quarto is the modern R-native answer. master.qmd interleaves prose + chunks that emit modelsummary / gtsummary / ggplot2 outputs; one quarto render produces .pdf / .docx / .html | YAML front matter sets format: [pdf, docx, html] for triple-target output |
Journal styling — pick the right stars and SE label. The AEA convention is c("*"=.1, "**"=.05, "***"=.01) and notes = "Cluster-robust standard errors in parentheses...". Define a wrapper once at the top of master.R:
# Top of master.R — journal house-style wrapper
# 输出三格式:.xlsx(编辑)、.docx(Word)、.tex(LaTeX)
aer_table <- function(models, output, headers = NULL, coef_map = NULL) {
base <- tools::file_path_sans_ext(output)
for (ext in c(".xlsx", ".docx", ".tex")) {
output_file <- paste0(base, ext)
fmt <- if (ext == ".xlsx") "html" else if (ext == ".docx") "docx" else "latex"
modelsummary(
models,
output = output_file,
stars = c("*" = 0.1, "**" = 0.05, "***" = 0.01),
gof_omit = "BIC|AIC|F|Log|Adj",
coef_map = coef_map,
notes = paste("Cluster-robust standard errors in parentheses.",
"* p<0.10, ** p<0.05, *** p<0.01."),
output_format = fmt
)
}
}
For the multi-panel .docx / .xlsx and Quarto cookbook (single-file paper-tables bundle), see references/08-tables-plots.md.
Before touching the data, write down (a) the population, (b) the design, (c) the minimum detectable effect (MDE) under the planned sample size and α=0.05, β=0.20. Persist the result as pap.json so a referee can verify the design was powered before, not after, the data were seen.
library(pwr) # classical power calculations
library(WebPower) # cluster RCT, longitudinal, mixed designs
library(jsonlite)
# Two-sample MDE for a continuous outcome (Cohen's d framing)
pwr.t.test(d = 0.20, power = 0.80, sig.level = 0.05,
type = "two.sample", alternative = "two.sided")
# → required n per arm
# Solve for MDE given fixed n
pwr.t.test(n = 2000, power = 0.80, sig.level = 0.05,
type = "two.sample")$d
# → minimum detectable Cohen's d
# Cluster-randomized RCT — design effect
# Solve via WebPower::wp.crt2arm(...) for clusters / per-cluster size / power triangle
WebPower::wp.crt2arm(f = 0.20, J = NULL, n = 50, icc = 0.05, power = 0.80,
alpha = 0.05, alternative = "two.sided")
# → required clusters per arm
# DID power (Frison-Pocock / Bloom 1995): use WebPower::wp.kanova() or simulate
# RD power: simulate via DeclareDesign — see references/05-modeling.md §5.5
# Persist the protocol — referee will ask whether design was powered ex ante
pap <- list(
population = "manufacturing workers, 2010–2020",
treatment = "training (binary, staggered adoption)",
outcome = "log_wage",
estimand = "ATT",
design = "staggered DID, Callaway-Sant'Anna",
alpha = 0.05,
power_target = 0.80,
mde_d = 0.20,
n_planned = 12000,
frozen_at = "2026-01-15T09:00:00Z",
git_sha = "<paste>"
)
write_json(pap, "artifacts/pap.json", pretty = TRUE, auto_unbox = TRUE)
For richer DAG-aware power analysis (write down the DAG, declare estimands, simulate the design), use DeclareDesign — it is the R-native equivalent of EGAP's pre-analysis flow.
Commit artifacts/pap.json in the repo before Step 1. AEA RCT Registry / OSF preregistration tools accept it as the analysis-plan exhibit.
An AER §1 Data section has three jobs: (a) describe sources, (b) document every sample restriction (the "footnote 4" sample log), (c) lock the panel structure.
library(tidyverse); library(jsonlite)
sample_log <- tibble::tibble(step = character(), n = integer())
df_raw <- read_dta("raw/panel.dta") %>% janitor::clean_names()
sample_log <- sample_log %>% add_row(step = "0. raw", n = nrow(df_raw))
df1 <- df_raw %>% drop_na(wage)
sample_log <- sample_log %>% add_row(step = "1. drop missing wage", n = nrow(df1))
df2 <- df1 %>% filter(between(age, 18, 65))
sample_log <- sample_log %>% add_row(step = "2. drop age outside 18-65", n = nrow(df2))
df3 <- df2 %>% filter(industry %in% c("manuf", "construction", "transport"))
sample_log <- sample_log %>% add_row(step = "3. keep target industries", n = nrow(df3))
df <- df3
print(sample_log)
write_json(sample_log, "artifacts/sample_construction.json", pretty = TRUE)
Paste the printed tibble verbatim as footnote 4 of the paper.
library(validate); library(assertr)
data_contract <- function(df, y, treatment, id = NULL, time = NULL, covariates = c()) {
keys <- c(y, treatment, id, time, covariates)
contract <- list(
n_obs = nrow(df), # 1. shape
dtypes = sapply(df[keys], function(x) class(x)[1]), # 2. dtypes
n_missing = sapply(df[keys], function(x) sum(is.na(x))), # 3. missingness
n_dupes_on_keys = if (!is.null(id) && !is.null(time))
sum(duplicated(df[, c(id, time)])) else 0, # 4. duplicates
panel_balanced = NULL,
cohort_sizes = NULL
)
if (!is.null(id) && !is.null(time)) {
bal <- df %>% count(.data[[id]])
contract$panel_balanced <- all(bal$n == max(bal$n)) # 5. balance
contract$n_dropped_by_balance <- sum(bal$n != max(bal$n))
if ("first_treat" %in% names(df)) {
contract$cohort_sizes <- df %>% distinct(.data[[id]], .keep_all = TRUE) %>%
count(first_treat) %>% deframe()
}
}
contract$y_range <- range(df[[y]], na.rm = TRUE)
contract$treatment_share <- mean(df[[treatment]], na.rm = TRUE)
# MCAR sniff test (Rubin) — if missing(y) is associated with covariates,
# listwise deletion biases the estimate. Use mice / IPW instead.
miss_y <- is.na(df[[y]])
contract$mcar_hint <- "likely MCAR (listwise OK)"
if (any(miss_y) && any(!miss_y)) {
for (cov in covariates) {
if (is.numeric(df[[cov]])) {
p <- t.test(df[[cov]][miss_y], df[[cov]][!miss_y])$p.value
if (p < 0.05) {
contract$mcar_hint <- sprintf("NOT MCAR (y-miss differs on %s, p=%.3f) → use mice / IPW",
cov, p)
break
}
}
}
}
contract
}
contract <- data_contract(df, y = "wage", treatment = "training",
id = "worker_id", time = "year",
covariates = c("age", "edu", "tenure"))
stopifnot(contract$n_dupes_on_keys == 0)
stopifnot(all(contract$n_missing == 0))
write_json(contract, "artifacts/data_contract.json",
pretty = TRUE, auto_unbox = TRUE)
If any stopifnot fires, stop and fix it in dplyr first. R estimators silently drop NA rows downstream — this contract is the cheapest insurance against "why did N drop from 12,000 to 9,800 between Table 1 and Table 2?" referee questions.
Deeper patterns: references/01-data-cleaning.md — every format (haven/readxl/data.table::fread/arrow::read_parquet/DBI), janitor::clean_names, naniar missingness viz, MCAR/MAR/MNAR triage with mice, validation with validate/assertr, panel structure checks.
library(tidyverse)
library(haven) # .dta / .sav / .sas7bdat
library(janitor) # clean_names()
library(naniar) # missing-data viz
library(skimr) # one-line dataset summary
# 1a. Load + first look
df <- read_dta("raw/panel.dta") %>%
clean_names() # standardize to snake_case
skim(df) # rich one-line-per-var summary
naniar::miss_var_summary(df)
naniar::vis_miss(df) # missingness heatmap
# 1b. Dtypes
df <- df %>%
mutate(
year = as.integer(year),
wage = as.numeric(wage),
gender = as.factor(gender),
date = as.Date(date)
)
# 1c. Missing values — decide PER VARIABLE
key_vars <- c("wage", "training", "worker_id", "year")
df <- df %>%
drop_na(all_of(key_vars))
cat("After dropping NA on keys:", nrow(df), "rows\n")
df <- df %>%
mutate(
tenure_missing = is.na(tenure),
tenure = if_else(is.na(tenure), median(tenure, na.rm = TRUE), tenure),
union = fct_explicit_na(as.factor(union), na_level = "unknown")
)
# 1d. Outliers — flag, don't drop yet
df <- df %>%
mutate(wage_z = scale(wage)[,1],
outlier_z4 = abs(wage_z) > 4)
cat("|z|>4 on wage:", sum(df$outlier_z4, na.rm = TRUE), "\n")
# 1e. Deduplicate panel key
stopifnot(nrow(df %>% distinct(worker_id, year)) == nrow(df))
# 1f. Merge with assertion
firm_chars <- read_dta("raw/firm_chars.dta")
n_before <- nrow(df)
df <- df %>%
left_join(firm_chars, by = "firm_id", relationship = "many-to-one")
stopifnot(nrow(df) == n_before) # no row inflation
# 1g. Panel structure
df %>% count(year) # per-year
df %>% count(worker_id) %>% summary() # per-unit
Key principle: dplyr + explicit stopifnot() assertions. No silent row drops downstream.
Deeper patterns: references/02-data-transformation.md — log/IHS/Box–Cox via MASS::boxcox, group winsorization with dplyr, scale() and bestNormalize, factor handling, lag/lead with dplyr::lag, panel timing.
library(DescTools) # Winsorize()
df <- df %>%
mutate(
# 2a. Log / IHS
log_wage = log(pmax(wage, 1)),
ihs_assets = asinh(assets),
# 2b. Winsorize 1/99
wage_w1 = DescTools::Winsorize(wage, probs = c(0.01, 0.99), na.rm = TRUE),
# 2c. Standardize
age_std = as.numeric(scale(age)),
# 2d. Polynomial / interaction (or use formula syntax in fixest)
age_sq = age^2,
trt_x_edu = training * edu
) %>%
# 2e. Within-group winsorize
group_by(industry, year) %>%
mutate(wage_w1_iy = DescTools::Winsorize(wage, probs = c(0.01, 0.99),
na.rm = TRUE)) %>%
ungroup() %>%
# 2f. Panel operators (always arrange first to make lag deterministic)
arrange(worker_id, year) %>%
group_by(worker_id) %>%
mutate(
log_wage_l1 = lag(log_wage, 1),
log_wage_f1 = lead(log_wage, 1),
d_log_wage = log_wage - lag(log_wage, 1),
wage_mean_i = mean(log_wage, na.rm = TRUE),
log_wage_dm = log_wage - wage_mean_i
) %>%
ungroup() %>%
# 2g. Staggered-DID timing
group_by(worker_id) %>%
mutate(first_treat = ifelse(any(training == 1),
min(year[training == 1]), NA_real_)) %>%
ungroup() %>%
mutate(rel_time = year - first_treat,
never_treated = is.na(first_treat))
# 2h. CPI deflation
cpi <- read_csv("raw/cpi.csv")
df <- df %>%
left_join(cpi, by = "year") %>%
mutate(cpi_base = cpi[year == 2010][1],
wage_real = wage * cpi_base / cpi,
log_wage_real = log(pmax(wage_real, 1)))
This is the heart of an AER paper. Before any code, write down the equation explicitly and state the identifying assumption. Vague identification language is the single most common reason a referee rejects an applied paper. Persist the strategy as strategy.md so it is a dated, version-controlled artifact — not a post-hoc rationalization written after seeing the coefficient.
| Design | Estimating equation | Identifying assumption | R estimator |
|---|---|---|---|
| 2×2 DID | Y_it = α_i + λ_t + β·D_it + X'γ + ε_it | parallel trends conditional on X | feols(y ~ i(treated, post, ref=0) | i + t, data, cluster = ~i) |
| Event-study (CS / SA) | Y_it = α_i + λ_t + Σ_{e≠-1} β_e · 1{t-G_i = e} + ε_it | no anticipation + group-time PT | feols(y ~ sunab(G, t) | i + t) · did::att_gt · didimputation::did_imputation |
| 2SLS | Y_i = α + β·D_i + X'γ + ε_i; D_i = π·Z_i + X'δ + u_i | exclusion + relevance + monotonicity | feols(y ~ X | D ~ Z, data) · AER::ivreg · ivreg::ivreg |
| Sharp RD | Y_i = α + β·1{X_i ≥ c} + f(X_i) + ε_i (local poly) | continuity of E[Y(0)|X] at c, no manipulation | rdrobust::rdrobust(y, x, c=0) (+ rddensity) |
| SCM | Ŷ_1t(0) = Σ_j ŵ_j Y_jt, τ_t = Y_1t − Ŷ_1t(0) for t≥T_0 | pre-period fit + interpolation validity | Synth::synth · gsynth::gsynth · synthdid::synthdid_estimate · tidysynth |
| Selection-on-observables (matching/IPW/DML) | Y_i = m(X_i) + β·D_i + ε_i (Robinson partialling-out) | unconfoundedness + overlap | MatchIt::matchit + lm · WeightIt · DoubleML::DoubleMLPLR · grf::causal_forest |
┌─ running var + cutoff ───────────────── RDD (rdrobust)
│
├─ exogenous instrument Z ─────────────── IV/2SLS (feols / AER::ivreg)
data + question ─┤
├─ pre/post × treat/control ─┬ 2 periods ── 2×2 DID (feols + i())
│ └ staggered ── CS / SA / BJS (att_gt / sunab / did_imputation)
│
├─ 1 treated unit + donor pool + long pre ── SCM (Synth / gsynth / synthdid)
│
├─ high-dim X, selection-on-observables ── ML causal (DoubleML / grf — see §B)
│
└─ none of the above ──────────────────── matching + sensitivity (MatchIt + EValue)
strategy.md templatestrategy <- "\\
# Empirical Strategy (pre-registration)
**Frozen**: 2026-01-15 (Git SHA: <paste>)
**Population**: manufacturing workers, 2010–2020, balanced panel
**Treatment**: training (binary, staggered adoption)
**Outcome**: log_wage (CPI-deflated 2010 USD)
**Estimand**: ATT on the treated, dynamic horizon -4..+4
## Estimating equation (paste from §2.5 row that matches the design)
log_wage_it = α_i + λ_t + Σ_{e≠-1} β_e · 1{t - G_i = e} + ε_it
## Identifying assumption
1. No anticipation: E[Y_it(0) | t < G_i] = E[Y_it(0) | never-treated]
2. Group-time PT: Δ E[Y_it(0)] is the same across treatment cohorts
## Auto-flagged threats (must defend in §2)
- Selection of G_i on Y_i(0) → bacondecomp + HonestDiD sensitivity
- Spillover within firm → cluster at firm_id, also try firm_id × year
- Anticipation in pre-period → include lead in event study
## Fallback estimators (Step 6 robustness)
- Sun–Abraham via `feols(y ~ sunab(G, t) | i + t, data)`
- Borusyak-Jaravel-Spiess via `didimputation::did_imputation`
- Synthetic DID via `synthdid::synthdid_estimate`
"
writeLines(strategy, "artifacts/strategy.md")
Commit artifacts/strategy.md in the repo before running Step 5 / Step 6. The git log of this file is the analysis plan.
Deeper patterns: references/03-descriptive-stats.md — gtsummary::tbl_summary (the modern Table 1 standard), modelsummary::datasummary_balance with SMDs, tableone::CreateTableOne, correlation matrices with significance via corrplot / psych::corr.test, distribution plots via ggplot2.
library(gtsummary)
library(modelsummary)
# 3a. Full-sample summary — one line, publication ready
df %>%
select(log_wage, age, edu, tenure, training) %>%
datasummary_skim()
# Or
df %>%
select(log_wage, age, edu, tenure, training) %>%
tbl_summary(
type = list(all_continuous() ~ "continuous2"),
statistic = all_continuous() ~ c("{N_nonmiss}", "{mean} ({sd})",
"{min} – {median} – {max}")
) %>%
bold_labels() %>%
as_kable_extra() %>%
kableExtra::save_kable("tables/table1_full.tex")
# 3b. Stratified Table 1 (treated vs control, with SMDs + p-values)
df %>%
select(log_wage, age, edu, tenure, female, training) %>%
tbl_summary(by = training, missing = "ifany") %>%
add_p() %>%
add_difference() %>%
add_n() %>%
modify_header(label = "**Variable**") %>%
bold_labels() %>%
as_gt() %>%
gt::gtsave("tables/table1_balance.html")
# Or via modelsummary (writes LaTeX/Word/HTML)
datasummary_balance(~ training,
data = df %>% select(training, age, edu, tenure, female),
output = "tables/table1_balance.tex")
# 3c. Correlation matrix with stars
library(corrplot); library(psych)
corr_obj <- corr.test(df %>% select(log_wage, age, edu, tenure, training),
method = "pearson")
corrplot(corr_obj$r, method = "color", type = "upper",
p.mat = corr_obj$p, sig.level = 0.05, insig = "blank",
addCoef.col = "black", number.cex = 0.7,
tl.col = "black", tl.srt = 45,
col = colorRampPalette(c("#B2182B","white","#2166AC"))(200))
# 3d. Distribution plots
library(ggplot2)
p1 <- ggplot(df, aes(log_wage, fill = factor(training))) +
geom_density(alpha = 0.5) +
scale_fill_manual(values = c("0" = "darkred", "1" = "navy"),
labels = c("Control", "Treated"), name = "") +
labs(x = "Log wage", y = "Density",
title = "Log-wage density by treatment") +
theme_classic()
p2 <- ggplot(df, aes(sample = log_wage)) +
stat_qq() + stat_qq_line() +
labs(title = "Normal Q-Q") + theme_classic()
cowplot::plot_grid(p1, p2, labels = "auto") %>%
ggsave("figures/distributions.pdf", plot = ., width = 10, height = 4)
# 3e. Time-trend (DID motivation)
df %>%
group_by(year, training) %>%
summarise(mean_log_wage = mean(log_wage, na.rm = TRUE), .groups = "drop") %>%
ggplot(aes(year, mean_log_wage, color = factor(training))) +
geom_line(linewidth = 1) + geom_point(size = 2) +
geom_vline(xintercept = policy_year, linetype = "dashed") +
scale_color_manual(values = c("0" = "darkred", "1" = "navy"),
labels = c("Control","Treated"), name = "") +
labs(x = "Year", y = "Mean log wage") + theme_classic()
ggsave("figures/trend_did.pdf", width = 7, height = 4)
AER convention: the identification figure precedes the regression table. The reader should see graphical evidence that PT holds / first stage is strong / RD jumps cleanly before you ask them to trust your point estimate.
Pre-period coefficients ≈ 0 (with the −1 reference period normalized to zero) is the visual evidence for parallel trends. Pair the figure with a numerical pre-trends test so reviewers don't have to eyeball it.
library(fixest); library(ggplot2)
# (a) Sun-Abraham via fixest::sunab — the modern primary for staggered DID
es <- feols(log_wage ~ sunab(first_treat, year) | worker_id + year,
data = df, cluster = ~ worker_id)
# (b) Coefficient figure
iplot(es,
xlab = "Years relative to treatment",
ylab = "Coefficient (ATT, 95% CI)",
main = "Figure 2a. Event-study coefficients (95% CI; ref. e = -1)")
ggsave("figures/fig2a_event_study.pdf", width = 7, height = 4)
ggsave("figures/fig2a_event_study.png", width = 7, height = 4, dpi = 300)
# (c) Numerical pre-trends Wald test (joint zero on the leads)
pre_idx <- grep("year::-", names(coef(es)))[!grepl("ref", names(coef(es)))]
W <- wald(es, names(coef(es))[pre_idx])
cat(sprintf("Pre-trends Wald χ² = %.2f, p = %.3f\n", W$stat, W$p))
# (d) Bacon decomposition (Goodman-Bacon 2021) — TWFE diagnostic
library(bacondecomp)
bd <- bacon(log_wage ~ training, data = df,
id_var = "worker_id", time_var = "year")
ggplot(bd, aes(weight, estimate, color = type, shape = type)) +
geom_point(size = 2) +
labs(title = "Figure 2a-bis. Goodman-Bacon decomposition",
x = "Weight", y = "Estimate")
ggsave("figures/fig2a_bacon.pdf", width = 7, height = 4)
# (e) Callaway-Sant'Anna dynamic ATT (when att_gt is the main estimator)
library(did)
cs <- att_gt(yname = "log_wage", tname = "year", idname = "worker_id",
gname = "first_treat", data = df,
control_group = "nevertreated", est_method = "dr",
clustervars = "firm_id")
ggdid(aggte(cs, type = "dynamic")) +
labs(title = "Figure 2a-ter. Dynamic ATT (Callaway-Sant'Anna)")
ggsave("figures/fig2a_csdid.pdf", width = 7, height = 4)
Rule of thumb: first-stage F ≥ 10 for OLS-style inference; F ≥ 23 for AR-equivalent inference (Stock–Yogo / Lee 2022). fixest::feols reports F automatically; AER::ivreg requires summary(..., diagnostics = TRUE).
iv <- feols(log_wage ~ age + edu | training ~ Z1 + Z2,
data = df, cluster = ~ firm_id)
summary(iv, stage = 1)
fitstat(iv, ~ ivf + ivwald + sargan + cd) # CD / KP / Sargan / first-stage F
# Binscatter for the first-stage scatter (residualized on age + edu)
library(binsreg)
binsreg(y = df$training, x = df$Z1, w = df[, c("age","edu")],
nbins = 20, polyreg = 2, ci = c(3, 3))
ggsave("figures/fig2b_first_stage.pdf", width = 7, height = 4)
The signature RD figure is rdplot (CCT-style binned scatter with local-polynomial fit on each side), paired with the McCrary manipulation test.
library(rdrobust); library(rddensity)
# (a) Canonical RD plot — binned means + local poly on each side
rdplot(y = df$outcome, x = df$running_var, c = 0,
p = 4, kernel = "triangular", binselect = "esmv",
title = "Figure 2c. RD plot")
ggsave("figures/fig2c_rdplot.pdf", width = 7, height = 4)
# (b) McCrary density (Cattaneo-Jansson-Ma 2018)
rdd <- rddensity(X = df$running_var, c = 0)
print(summary(rdd))
rdplotdensity(rdd, X = df$running_var,
title = "Figure 2c-bis. McCrary density (manipulation test)")
ggsave("figures/fig2c_mccrary.pdf", width = 7, height = 4)
library(MatchIt); library(cobalt)
m.out <- matchit(training ~ age + edu + tenure + firm_size,
data = df, method = "nearest", ratio = 1)
love.plot(m.out, threshold = 0.10,
var.order = "unadjusted", abs = TRUE,
title = "Figure 2d. Love plot — |SMD| pre vs post matching")
ggsave("figures/fig2d_loveplot.pdf", width = 7, height = 4)
For synthetic-control designs the canonical Figure 2 is the treated-vs-synthetic time series with treatment time annotated.
library(tidysynth)
sc <- df %>%
synthetic_control(outcome = log_wage, unit = unit_id, time = year,
i_unit = "treated_unit_name", i_time = 2015) %>%
generate_predictor(time_window = 2010:2014,
mean_age = mean(age, na.rm = TRUE),
mean_edu = mean(edu, na.rm = TRUE)) %>%
generate_weights() %>% generate_control()
plot_trends(sc); ggsave("figures/fig2e_synth_trajectory.pdf", width = 7, height = 4)
plot_differences(sc); ggsave("figures/fig2e_synth_gap.pdf", width = 7, height = 4)
# Synthetic DID
library(synthdid)
sdid_setup <- panel.matrices(df, unit = "worker_id", time = "year",
outcome = "log_wage", treatment = "training")
sdid_fit <- synthdid_estimate(sdid_setup$Y, sdid_setup$N0, sdid_setup$T0)
plot(sdid_fit, control.name = "Synthetic DiD")
ggsave("figures/fig2e_sdid.pdf", width = 7, height = 4)
Identification-specific checks (PT for DID, weak-IV F, density for RD, common support for matching) are also auto-run inside the Step-5 estimators — don't duplicate the numerics here, but DO produce the figures: a referee scans the figures first.
Deeper patterns: references/04-statistical-tests.md — every classical test. lmtest/sandwich/car/tseries/urca/plm.
library(lmtest)
library(sandwich)
library(car)
library(tseries)
library(urca)
# Fit baseline OLS for diagnostics
ols <- lm(log_wage ~ training + age + edu + tenure, data = df)
# 4a. Normality of residuals
shapiro.test(sample(residuals(ols), min(5000, length(residuals(ols)))))
tseries::jarque.bera.test(residuals(ols))
# 4b. Heteroskedasticity
bptest(ols) # Breusch-Pagan
bptest(ols, ~ I(fitted(ols)^2) + ., data = df) # White-style
# 4c. Autocorrelation (time series / panel)
dwtest(ols) # Durbin-Watson
bgtest(ols, order = 4) # Breusch-Godfrey
Box.test(residuals(ols), lag = 8, type = "Ljung-Box")
# Panel-specific
library(plm)
pdata <- pdata.frame(df, index = c("worker_id", "year"))
plm_fe <- plm(log_wage ~ training + age + edu, data = pdata, model = "within")
pbgtest(plm_fe) # Wooldridge serial correlation
pcdtest(plm_fe, test = "cd") # Pesaran cross-sectional dependence
# 4d. Multicollinearity
vif(ols) # VIFs
kappa(model.matrix(ols), exact = TRUE) # condition number
# 4e. Stationarity (time series — assumes a single y over time)
adf.test(df$log_wage, k = 4) # ADF
kpss.test(df$log_wage, null = "Level") # KPSS
# 4f. Hausman (FE vs RE)
plm_re <- plm(log_wage ~ training + age + edu, data = pdata, model = "random")
phtest(plm_fe, plm_re)
# 4g. Specification — RESET
resettest(ols, power = 2:3, type = "fitted")
Decision table:
| Test | Null | Action if rejected |
|------|------|--------------------|
| shapiro.test / jarque.bera.test | residuals Normal | bootstrap CIs if N small |
| bptest | homoskedastic | use HC3 via coeftest(ols, vcov = vcovHC(ols, "HC3")) or cluster |
| dwtest / bgtest | no autocorr | HAC SEs (vcovHAC) or cluster by unit |
| pbgtest (panel) | no panel autocorr | cluster by entity |
| pcdtest | no CSD | Driscoll–Kraay (vcovDC) |
| vif > 10 | — | drop / combine |
| ADF rejects + KPSS doesn't | stationary | levels |
| ADF doesn't reject | unit root | first-difference |
| phtest | RE consistent | use FE |
Deeper patterns: references/05-modeling.md — every estimator. fixest is the workhorse.
This is the densest section of an applied paper. A modern AER §4 typically contains 2–3 multi-regression tables and one coefficient plot:
Estimator routing (memorize this — getting it wrong silently produces nonsense):
- No FE / single low-card FE →
feols(y ~ X, data, cluster = ~i)- High-dim FE →
feols(y ~ X | fe1 + fe2, data, cluster = ~i)- Two-way cluster →
feols(..., cluster = ~ firm_id + year)- 2SLS / IV →
feols(y ~ X | D ~ Z, data, cluster = ~ firm_id)(orAER::ivregfor diagnostics)- DID / event-study →
feols(y ~ sunab(G, t) | i + t, data)(SA) ·did::att_gt(CS) ·didimputation::did_imputation(BJS)
Pick by identification strategy:
Cross-section, selection on observables → feols | MatchIt + lm | WeightIt
Panel + policy shock + parallel trends → feols / did::att_gt / sunab / didimputation / synthdid
Exogenous instrument → feols(... | endog ~ z) | AER::ivreg
Discontinuity → rdrobust + rddensity + rdmc
N=1 treated, long panel → Synth / gsynth / synthdid
Selection on observables + heterogeneity → WeightIt + cobalt; grf::causal_forest
Binary outcome → feglm or glm(family=binomial)
Count outcome → fepois
Canonical calls (the eight patterns A–H below are the AER table cookbook — modelsummary(...) and fixest::etable(...) are the two workhorses, equivalent to Stata outreg2/esttab and Python pf.etable/Stargazer).
Stable β̂ across columns ⇒ less concern that selection on observables is driving the estimate (Oster 2019 selection-stability logic; quantified in Step 6).
library(fixest); library(modelsummary)
m1 <- feols(log_wage ~ training, data = df, cluster = ~ firm_id)
m2 <- feols(log_wage ~ training + age + edu, data = df, cluster = ~ firm_id)
m3 <- feols(log_wage ~ training + age + edu + tenure + firm_size, data = df, cluster = ~ firm_id)
m4 <- feols(log_wage ~ training + age + edu + tenure + firm_size | industry + year,
data = df, cluster = ~ firm_id)
m5 <- feols(log_wage ~ training + age + edu + tenure + firm_size | worker_id + year,
data = df, cluster = ~ firm_id)
m6 <- feols(log_wage ~ training + age + edu + tenure + firm_size | worker_id + year + industry^year,
data = df, cluster = ~ firm_id)
modelsummary(
list("(1) Baseline" = m1,
"(2) +Demog" = m2,
"(3) +Labor-mkt" = m3,
"(4) Ind×Yr FE" = m4,
"(5) Worker FE" = m5,
"(6) Worker FE+Ind×Yr" = m6),
output = "tables/table2_main.tex",
stars = c("*" = 0.1, "**" = 0.05, "***" = 0.01),
gof_omit = "BIC|AIC|F|Log|Adj",
coef_map = c("training" = "Job training",
"age" = "Age", "edu" = "Education",
"tenure" = "Tenure", "firm_size" = "Firm size"),
notes = c("Cluster-robust SE in parentheses, clustered at firm_id.",
"* p<0.10, ** p<0.05, *** p<0.01.")
)
modelsummary(list("(1)"=m1,"(2)"=m2,"(3)"=m3,"(4)"=m4,"(5)"=m5,"(6)"=m6),
output = "tables/table2_main.docx")
AER convention: show ALL controls (and the intercept). Pass NEITHER
keep =NORcoef_omit =so every parameter is visible. Usecoef_map = c("training" = "Training")(single mapping) only when a focal-coefficient-only table is intentional (interaction-form heterogeneity, IV first-stage triplet); usecoef_omit = "Intercept"only when you want to suppress the constant for paper aesthetics.
Show the same coefficient of interest under multiple identification strategies. This is the AER credibility move: convergent evidence across designs each making different identifying assumptions.
library(fixest); library(AER); library(did); library(MatchIt); library(WeightIt)
ols <- feols(log_wage ~ training + age + edu + tenure | industry + year,
data = df, cluster = ~ firm_id)
iv <- feols(log_wage ~ age + edu + tenure | training ~ Z1 + Z2,
data = df, cluster = ~ firm_id)
cs <- att_gt(yname = "log_wage", tname = "year", idname = "worker_id",
gname = "first_treat", data = df,
control_group = "nevertreated", est_method = "dr",
clustervars = "firm_id")
psm <- matchit(training ~ age + edu + tenure, data = df,
method = "nearest", ratio = 1)
psm_lm <- lm(log_wage ~ training + age + edu + tenure,
data = match.data(psm), weights = weights)
ebal <- weightit(training ~ age + edu + tenure, data = df, method = "ebal")
ebal_lm <- lm(log_wage ~ training + age + edu + tenure,
data = df, weights = ebal$weights)
modelsummary(
list("(1) OLS+FE" = ols,
"(2) 2SLS" = iv,
"(3) CS-DID" = aggte(cs, type = "simple"),
"(4) PSM" = psm_lm,
"(5) Entropy bal." = ebal_lm),
output = "tables/table2b_designs.tex",
stars = c("*" = 0.1, "**" = 0.05, "***" = 0.01),
coef_map = c("training" = "Job training (β̂)"),
gof_omit = "BIC|AIC|F|Log|Adj",
notes = "Convergent evidence: same β̂ under five identification strategies."
)
ys <- c("log_wage", "weeks_employed", "left_firm", "promoted")
multi_y <- lapply(ys, function(y)
feols(as.formula(paste(y, "~ training + age + edu + tenure | industry + year")),
data = df, cluster = ~ firm_id))
names(multi_y) <- ys
modelsummary(multi_y,
output = "tables/table2c_multi_outcome.tex",
stars = c("*" = 0.1, "**" = 0.05, "***" = 0.01),
coef_map = c("training" = "Training"),
notes = "Each column is a separate regression on the labelled outcome.")
Same model family, two horizons (short-run / long-run) or two samples. Use gt::gt_group() to stack two modelsummary blocks with panel headers.
library(gt)
panelA <- list(
"(1) Industry FE" = feols(wage_t1 ~ training + X | industry + year, data = df, cluster = ~ firm_id),
"(2) Worker FE" = feols(wage_t1 ~ training + X | worker_id + year, data = df, cluster = ~ firm_id))
panelB <- list(
"(1) Industry FE" = feols(wage_t5 ~ training + X | industry + year, data = df, cluster = ~ firm_id),
"(2) Worker FE" = feols(wage_t5 ~ training + X | worker_id + year, data = df, cluster = ~ firm_id))
ms_A <- modelsummary(panelA, output = "gt") %>%
tab_header(title = "Panel A. Short-run (1 year)")
ms_B <- modelsummary(panelB, output = "gt") %>%
tab_header(title = "Panel B. Long-run (5 years)")
gt_group(ms_A, ms_B) %>%
gtsave("tables/table2d_horizons.tex")
gt_group(ms_A, ms_B) %>%
gtsave("tables/table2d_horizons.docx")
The textbook AER IV table presents the first stage, the reduced form, and the 2SLS in three columns so the reader can verify Wald-ratio = RF / FS.
fs <- feols(training ~ Z + age + edu | industry + year, data = df, cluster = ~ firm_id)
rf <- feols(log_wage ~ Z + age + edu | industry + year, data = df, cluster = ~ firm_id)
iv2 <- feols(log_wage ~ age + edu | training ~ Z, data = df, cluster = ~ firm_id)
modelsummary(
list("(1) First stage" = fs,
"(2) Reduced form" = rf,
"(3) 2SLS" = iv2),
output = "tables/table2e_iv_triplet.tex",
stars = c("*" = 0.1, "**" = 0.05, "***" = 0.01),
coef_map = c("Z" = "Instrument Z", "training" = "Training (endog.)"),
gof_map = list(list(raw = "ivf", clean = "First-stage F", fmt = 2)),
notes = "Wald ratio: $\\hat\\beta_{2SLS} = \\hat\\beta_{RF} / \\hat\\pi_{FS}$."
)
IV triplet is intentionally focal: show only Z + endogenous regressor so the reader can eyeball the Wald ratio. Drop
coef_map=only if a referee asks for the full coefficient list.
did::att_gt / synthdid / grf::causal_forestFor DID / SCM / matching / forest mains, the modern R estimator returns a self-contained estimate + automatic placebos / pre-trends / overlap diagnostics. Pipe into modelsummary via the auto-tidiers.
# CS-DID with full diagnostics
cs <- att_gt(yname = "log_wage", tname = "year", idname = "worker_id",
gname = "first_treat", data = df,
control_group = "nevertreated", est_method = "dr",
clustervars = "firm_id")
print(aggte(cs, type = "group")) # ATT(g) summary
print(aggte(cs, type = "dynamic", min_e = -4, max_e = 4)) # event-study aggregation
# Synthetic DID
library(synthdid)
sdid_setup <- panel.matrices(df, unit="worker_id", time="year",
outcome="log_wage", treatment="training")
sdid_fit <- synthdid_estimate(sdid_setup$Y, sdid_setup$N0, sdid_setup$T0)
print(summary(sdid_fit))
# Causal forest with overlap + variable importance
library(grf)
cf <- causal_forest(X = as.matrix(df[, c("age","edu","tenure","firm_size")]),
Y = df$log_wage, W = df$training, num.trees = 4000)
average_treatment_effect(cf, target.sample = "treated")
test_calibration(cf)
variable_importance(cf)
modelsummary (Table 3, see Step 7)One column per subgroup. Detailed code in §Step 7 — Heterogeneity.
Stack every robustness specification next to the baseline. Detailed code in §Step 6.
library(fixest)
# 5a. OLS with cluster-robust SEs — feols is the modern primary
ols <- feols(log_wage ~ training + age + edu + tenure,
data = df, cluster = ~ firm_id)
summary(ols)
# 5b. Two-way FE — single line
fe <- feols(log_wage ~ training + age + edu + tenure | worker_id + year,
data = df, cluster = ~ worker_id)
# Multi-way clustering
fe_mw <- feols(log_wage ~ training | worker_id + year,
data = df, cluster = ~ worker_id + firm_id)
# High-dim interaction FE
fe_hd <- feols(log_wage ~ training | worker_id + industry^year,
data = df, cluster = ~ firm_id)
# 5c. 2×2 DID
did22 <- feols(log_wage ~ i(treated, post, ref = 0) + age + edu,
data = df, cluster = ~ worker_id)
# Or with absorbed FE:
did22 <- feols(log_wage ~ i(treated, post, ref = 0) | worker_id + year,
data = df, cluster = ~ worker_id)
# 5d. Event study — base period at -1
es <- feols(log_wage ~ i(rel_time, ref = -1) | worker_id + year,
data = df %>% filter(!is.na(first_treat)),
cluster = ~ worker_id)
iplot(es,
xlab = "Years relative to treatment",
main = "Event study")
# 5e. Staggered DID — modern estimators (see references/05-modeling.md §5.4)
library(did)
cs <- att_gt(yname = "log_wage", tname = "year", idname = "worker_id",
gname = "first_treat", data = df,
control_group = "nevertreated",
est_method = "dr",
clustervars = "firm_id")
ggdid(cs) # event-study plot
# Sun & Abraham via fixest::sunab
sa <- feols(log_wage ~ sunab(first_treat, year) | worker_id + year,
data = df, cluster = ~ worker_id)
iplot(sa, sub.title = "Sun-Abraham (2021)")
# Borusyak–Jaravel–Spiess (didimputation)
library(didimputation)
bjs <- did_imputation(data = df, yname = "log_wage", gname = "first_treat",
tname = "year", idname = "worker_id",
horizon = 0:5, pretrends = -5:-1,
cluster_var = "worker_id")
# Synthetic DID
library(synthdid)
sdid_setup <- synthdid::panel.matrices(df, unit = "worker_id", time = "year",
outcome = "log_wage", treatment = "training")
sdid_fit <- synthdid_estimate(sdid_setup$Y, sdid_setup$N0, sdid_setup$T0)
# 5f. IV / 2SLS
iv <- feols(log_wage ~ age + edu | training ~ draft_lottery + z2,
data = df, cluster = ~ firm_id)
summary(iv, stage = 1)
fitstat(iv, ~ ivf + ivwald + sargan) # first-stage F + Wald + overid
# Or via AER:
library(AER)
iv_aer <- ivreg(log_wage ~ training + age + edu |
draft_lottery + z2 + age + edu, data = df)
summary(iv_aer, vcov. = sandwich, diagnostics = TRUE)
# 5g. Sharp RD
library(rdrobust); library(rddensity)
rd <- rdrobust(y = df$outcome, x = df$running_var, c = 0,
kernel = "triangular", bwselect = "mserd")
summary(rd)
rdplot(y = df$outcome, x = df$running_var, c = 0)
rddensity(X = df$running_var, c = 0) # manipulation test
# 5h. Binary outcome
logit <- feglm(employed ~ training + age + edu | firm_id + year,
data = df, family = binomial(link = "logit"),
cluster = ~ firm_id)
library(marginaleffects)
avg_slopes(logit, variables = "training") # AME
# 5i. Count w/ HD FE
pois <- fepois(citations ~ training + age | firm_id + year,
data = df, cluster = ~ firm_id)
Deeper patterns: references/06-robustness.md — modelsummary for M1–M6; clubSandwich/fwildclusterboot; bacondecomp/HonestDiD/robomit; ri2 randomization inference.
library(modelsummary)
library(fixest)
# 6a. Progressive specs (M1 → M6)
m1 <- feols(log_wage ~ training, data = df, cluster = ~ firm_id)
m2 <- feols(log_wage ~ training + age + edu, data = df, cluster = ~ firm_id)
m3 <- feols(log_wage ~ training + age + edu + tenure | worker_id,
data = df, cluster = ~ worker_id)
m4 <- feols(log_wage ~ training + age + edu + tenure | worker_id + year,
data = df, cluster = ~ worker_id)
m5 <- feols(log_wage ~ training + age + edu + tenure | worker_id + year + region,
data = df, cluster = ~ worker_id)
m6 <- feols(log_wage ~ training + age + edu + tenure | worker_id + year + industry^year,
data = df, cluster = ~ worker_id)
modelsummary(list("(1)" = m1, "(2)" = m2, "(3)" = m3,
"(4)" = m4, "(5)" = m5, "(6)" = m6),
stars = c('*' = .1, '**' = .05, '***' = .01),
gof_omit = "BIC|AIC|F|Log",
coef_map = c("training" = "Training",
"age" = "Age", "edu" = "Education", "tenure" = "Tenure"),
output = "tables/table_main.tex")
# 6b. Alternative cluster levels
for (cl in c("worker_id", "firm_id", "industry", "state")) {
fit <- feols(log_wage ~ training | worker_id + year, data = df,
cluster = as.formula(paste0("~", cl)))
cat(cl, ": b=", coef(fit)["training"], " se=", se(fit)["training"], "\n")
}
# 6c. Wild cluster bootstrap (when few clusters)
library(fwildclusterboot)
boot <- boottest(m4, param = "training", clustid = "state",
B = 9999, seed = 42)
summary(boot)
# 6d. Subsample splits
splits <- list(
"Female=0" = df %>% filter(female == 0),
"Female=1" = df %>% filter(female == 1),
"Young (<40)" = df %>% filter(age < 40),
"Old (>=40)" = df %>% filter(age >= 40)
)
sub_fits <- imap(splits, ~ feols(log_wage ~ training | worker_id + year,
data = .x, cluster = ~ worker_id))
modelsummary(sub_fits, stars = TRUE)
# 6e. Placebo — fake timing
df_placebo <- df %>%
mutate(fake_first = first_treat - 3,
fake_post = year >= fake_first) %>%
filter(year < first_treat)
feols(log_wage ~ fake_post | worker_id + year,
data = df_placebo, cluster = ~ worker_id)
# 6f. Randomization inference
library(ri2)
ri_out <- conduct_ri(formula = log_wage ~ training + age + edu,
declaration = randomizr::declare_ra(N = nrow(df),
prob = mean(df$training)),
assignment = "training",
sharp_hypothesis = 0,
data = df,
sims = 1000)
summary(ri_out); plot(ri_out)
# 6g. TWFE bias diagnosis
library(bacondecomp)
bacon_out <- bacon(log_wage ~ training,
data = df, id_var = "worker_id", time_var = "year")
ggplot(bacon_out, aes(weight, estimate, color = type)) + geom_point()
ggsave("figures/bacon.pdf")
# 6h. Parallel-trends sensitivity
library(HonestDiD)
honest_out <- createSensitivityResults(betahat = es$coefficients,
sigma = vcov(es),
numPrePeriods = 5, numPostPeriods = 5,
Mbarvec = seq(0, 0.5, by = 0.05))
createSensitivityPlot(honest_out, originalResults = honest_out$mainResult)
ggsave("figures/honestdid.pdf")
# 6i. Oster (2019) δ*
library(robomit)
o_test(y = "log_wage", x = "training",
con = "age + edu + tenure | worker_id + year",
id = "worker_id", time = "year",
data = df, R2max = 1.3 * fitstat(m6, "r2"), beta = 0)
# ============================================================
# 6j. Pattern H — Robustness master table (Table A1, one column per check)
# ============================================================
library(modelsummary); library(MatchIt); library(WeightIt)
base <- feols(log_wage ~ training + age + edu + tenure | industry + year,
data = df, cluster = ~ firm_id)
no99 <- feols(log_wage ~ training + age + edu + tenure | industry + year,
data = df %>% filter(wage < quantile(wage, 0.99, na.rm = TRUE)),
cluster = ~ firm_id)
balpan <- feols(log_wage ~ training + age + edu + tenure | industry + year,
data = df %>% group_by(worker_id) %>%
filter(n_distinct(year) == max(n_distinct(year))) %>% ungroup(),
cluster = ~ firm_id)
dropearly <- feols(log_wage ~ training + age + edu + tenure | industry + year,
data = df %>% filter(first_treat > 2008), cluster = ~ firm_id)
wfe <- feols(log_wage ~ training + age + edu + tenure | worker_id + year,
data = df, cluster = ~ firm_id)
cl2way <- feols(log_wage ~ training + age + edu + tenure | industry + year,
data = df, cluster = ~ firm_id + year)
logy <- feols(log(wage + 1) ~ training + age + edu + tenure | industry + year,
data = df, cluster = ~ firm_id)
ihsy <- feols(asinh(wage) ~ training + age + edu + tenure | industry + year,
data = df, cluster = ~ firm_id)
m_psm <- matchit(training ~ age + edu + tenure + firm_size, data = df, method = "nearest")
psm_lm <- lm(log_wage ~ training + age + edu + tenure, data = match.data(m_psm), weights = weights)
ebal_w <- weightit(training ~ age + edu + tenure + firm_size, data = df, method = "ebal")
ebal_lm <- lm(log_wage ~ training + age + edu + tenure, data = df, weights = ebal_w$weights)
modelsummary(
list("(1) Baseline" = base,
"(2) Drop top 1%" = no99,
"(3) Balanced" = balpan,
"(4) Drop early" = dropearly,
"(5) Worker FE" = wfe,
"(6) 2-way cluster" = cl2way,
"(7) log Y" = logy,
"(8) IHS Y" = ihsy,
"(9) PSM" = psm_lm,
"(10) Entropy bal." = ebal_lm),
output = "tables/tableA1_robustness.tex",
stars = c("*" = 0.1, "**" = 0.05, "***" = 0.01),
coef_map = c("training" = "Training (β̂)"),
gof_omit = "BIC|AIC|F|Log|Adj",
notes = "Each column is one robustness check. β̂ on training is the focal coefficient."
)
# ============================================================
# 6k. Specification curve (Simonsohn-Simmons-Nelson 2020) via `specr`
# ============================================================
library(specr); library(ggplot2)
specs <- setup(data = df,
y = c("log_wage", "ihs_wage"),
x = "training",
model = c("feols"),
controls = c("age", "edu", "tenure", "firm_size"),
subsets = list(industry = c("manuf", "construction", "transport")))
results <- specr(specs)
plot(results, choices = c("x", "y", "controls", "subsets"))
ggsave("figures/fig5_spec_curve.pdf", width = 10, height = 6)
ggsave("figures/fig5_spec_curve.png", width = 10, height = 6, dpi = 300)
# Hand-rolled alternative when `specr` doesn't fit (with custom FE / SE):
# spec_grid <- expand.grid(controls = list(c("age"), c("age","edu"), c("age","edu","tenure")),
# ytrans = c("log_wage", "ihs_wage"),
# sample = c("all", "manuf", "no99"),
# cluster = c("firm_id", "firm_id+year"))
# Loop, run feols, collect b/se, ggplot::geom_pointrange.
# ============================================================
# 6l. Sensitivity dashboard — HonestDiD + Oster + E-value
# ============================================================
# (a) HonestDiD — Rambachan-Roth (2023): bound on β̂ under bounded PT violation
library(HonestDiD)
es_pre <- coef(es)[grep("year::-", names(coef(es)))]
es_post <- coef(es)[grep("year::[0-9]", names(coef(es)))]
honest_out <- createSensitivityResults(betahat = c(es_pre, es_post),
sigma = vcov(es)[c(names(es_pre), names(es_post)),
c(names(es_pre), names(es_post))],
numPrePeriods = length(es_pre),
numPostPeriods = length(es_post),
Mbarvec = seq(0, 0.5, by = 0.05))
createSensitivityPlot(honest_out, originalResults = honest_out$mainResult)
ggsave("figures/fig6_honestdid.pdf", width = 7, height = 4)
# (b) Oster δ — `robomit::o_test` (already shown in 6i)
# (c) E-value (VanderWeele-Ding 2017) — for risk-ratio outcomes
library(EValue)
evalue(RR(1.45), lo = 1.10, hi = 1.91)
# → reports the minimum strength of unmeasured confounding to nullify the result.
Deeper patterns: references/07-further-analysis.md — marginaleffects is the post-estimation workhorse; mediation::mediate for Imai mediation; lavaan for SEM; grf::causal_forest for CATE.
library(marginaleffects)
library(fixest)
# 7a. Heterogeneity via interaction
het <- feols(log_wage ~ i(female, training, ref = 0) + age + edu | worker_id + year,
data = df, cluster = ~ worker_id)
summary(het)
iplot(het) # visualize interaction
# Continuous moderator + marginsplot
het_c <- feols(log_wage ~ training * tenure + age + edu | worker_id + year,
data = df, cluster = ~ worker_id)
plot_slopes(het_c, variables = "training",
condition = list(tenure = seq(0, 20, by = 2))) +
geom_hline(yintercept = 0, linetype = "dashed") +
labs(x = "Tenure", y = "Marginal effect of training")
ggsave("figures/het_tenure.pdf", width = 6, height = 4)
# 7b. Triple difference
ddd <- feols(log_wage ~ treated * post * high_exposure | worker_id + year,
data = df, cluster = ~ firm_id)
# 7c. Outcome ladder
out_ladder <- list()
for (y in c("hours_worked", "productivity", "log_wage")) {
out_ladder[[y]] <- feols(as.formula(paste(y, "~ training | worker_id + year")),
data = df, cluster = ~ worker_id)
}
modelsummary(out_ladder, stars = TRUE,
coef_map = c("training" = "Training"),
output = "tables/outcome_ladder.tex")
# 7d. Mediation — Imai et al. (2010)
library(mediation)
med_M <- lm(hours_worked ~ training + age + edu, data = df)
med_Y <- lm(log_wage ~ training + hours_worked + age + edu, data = df)
med <- mediate(med_M, med_Y, treat = "training", mediator = "hours_worked",
boot = TRUE, sims = 1000)
summary(med); plot(med)
# Sensitivity to unobserved M-Y confounding
medsens <- medsens(med, rho.by = 0.05, effect.type = "indirect")
plot(medsens)
# 7e. CATE via causal forest
library(grf)
cf <- causal_forest(X = as.matrix(df %>% select(age, edu, tenure, firm_size)),
Y = df$log_wage, W = df$training,
num.trees = 2000, min.node.size = 5)
df$tau_hat <- predict(cf)$predictions
variable_importance(cf)
average_treatment_effect(cf, target.sample = "all")
# Plot CATE by a moderator
ggplot(df, aes(tenure, tau_hat)) +
geom_smooth(method = "loess", se = TRUE) +
labs(x = "Tenure", y = "Estimated CATE")
ggsave("figures/cate_tenure.pdf")
# 7f. Dose-response — splines
library(splines)
dr <- feols(log_wage ~ ns(training_hours, df = 4) + age + edu | worker_id + year,
data = df, cluster = ~ worker_id)
plot_predictions(dr, condition = "training_hours")
This step is mandatory — every analysis run produces all 5 required tables (T1–T5) and all 4 required figures (F1–F4) defined in the Default Output Spec at the top of this skill. Do not skip Step 8 because "the regression already ran". A coefficient without a table and a figure is not how applied economics communicates a result.
Deeper patterns: references/08-tables-plots.md — modelsummary is the modern default (LaTeX/Word/HTML/Excel from one call); kableExtra for further LaTeX styling; gt for HTML/Word; ggplot2 + iplot + ggpubr + cowplot + binsreg for figures.
library(modelsummary)
library(kableExtra)
library(gt)
library(fixest)
library(ggplot2)
# ============================================================
# 8a. ★ TABLE 2 — Main results, multi-column regression M1→M6
# (the centerpiece of every economics paper)
# ============================================================
modelsummary(
list("(1) Raw" = m1,
"(2) +Demog" = m2,
"(3) +Tenure" = m3,
"(4) +Unit FE" = m4,
"(5) +2-way FE" = m5,
"(6) +Ind×Yr FE" = m6),
stars = c('*' = .1, '**' = .05, '***' = .01),
coef_map = c("training" = "Training",
"age" = "Age", "edu" = "Education", "tenure" = "Tenure"),
gof_map = list(
list("raw" = "nobs", "clean" = "N", "fmt" = 0),
list("raw" = "r.squared", "clean" = "R²", "fmt" = 3),
list("raw" = "adj.r.squared","clean" = "Adj. R²", "fmt" = 3)
),
notes = "Cluster-robust SE at worker_id in parentheses. * p<0.10, ** p<0.05, *** p<0.01.",
output = "tables/table2_main.tex"
)
modelsummary(list("(1)"=m1, "(2)"=m2, "(3)"=m3, "(4)"=m4, "(5)"=m5, "(6)"=m6),
stars = TRUE, output = "tables/table2_main.docx")
# ============================================================
# 8b. TABLE 1 — Summary statistics & balance
# ============================================================
library(gtsummary)
tbl1 <- df %>%
select(log_wage, age, edu, tenure, female, training) %>%
tbl_summary(by = training, missing = "ifany",
statistic = all_continuous() ~ "{mean} ({sd})") %>%
add_p() %>% add_difference() %>% add_n() %>% bold_labels()
tbl1 %>% as_kable_extra(format = "latex", booktabs = TRUE) %>%
kableExtra::save_kable("tables/table1_balance.tex")
tbl1 %>% as_flex_table() %>%
flextable::save_as_docx(path = "tables/table1_balance.docx")
# ============================================================
# 8c. TABLE 3 — Mechanism / outcome ladder (3+ outcomes)
# ============================================================
ladder <- list()
for (y in c("hours_worked", "productivity", "log_wage")) {
ladder[[y]] <- feols(as.formula(paste(y, "~ training + age + edu + tenure | worker_id + year")),
data = df, cluster = ~ worker_id)
}
modelsummary(ladder,
stars = c('*' = .1, '**' = .05, '***' = .01),
coef_map = c("training" = "Training"),
notes = "Each column is a separate regression on the labelled outcome. Cluster-robust SE at worker_id.",
output = "tables/table3_mechanism.tex")
# ============================================================
# 8d. TABLE 4 — Heterogeneity (subgroup × main coef)
# ============================================================
het_specs <- list(
"All" = df,
"Female=0" = df %>% filter(female == 0),
"Female=1" = df %>% filter(female == 1),
"Age<40" = df %>% filter(age < 40),
"Age≥40" = df %>% filter(age >= 40),
"Manuf." = df %>% filter(industry == "manufacturing")
)
het_models <- imap(het_specs,
~ feols(log_wage ~ training + age + edu + tenure | worker_id + year,
data = .x, cluster = ~ worker_id))
modelsummary(het_models,
stars = c('*' = .1, '**' = .05, '***' = .01),
coef_map = c("training" = "Training"),
notes = "Cluster-robust SE at worker_id. Wald p-values for cross-subgroup equality should accompany this table — see references/07.",
output = "tables/table4_heterogeneity.tex")
# ============================================================
# 8e. TABLE 5 — Robustness battery (alt SE / cluster / sample / placebo)
# ============================================================
rob <- list(
"Baseline" = feols(log_wage ~ training | worker_id + year, data = df,
cluster = ~ worker_id),
"Cluster=Firm" = feols(log_wage ~ training | worker_id + year, data = df,
cluster = ~ firm_id),
"2-way Cluster" = feols(log_wage ~ training | worker_id + year, data = df,
cluster = ~ worker_id + firm_id),
"Winsor 1/99" = feols(log_wage ~ training | worker_id + year,
data = df %>% mutate(log_wage = DescTools::Winsorize(log_wage,
probs = c(.01,.99),
na.rm = TRUE)),
cluster = ~ worker_id),
"Drop Manuf." = feols(log_wage ~ training | worker_id + year,
data = df %>% filter(industry != "manufacturing"),
cluster = ~ worker_id),
"Placebo (-3)" = feols(log_wage ~ fake_post | worker_id + year,
data = df %>% filter(year < first_treat),
cluster = ~ worker_id)
)
modelsummary(rob,
stars = c('*' = .1, '**' = .05, '***' = .01),
output = "tables/table5_robustness.tex")
# ============================================================
# 8f. ★ FIGURE 3 — Coefficient plot across M1→M6
# ============================================================
modelplot(list("(1)"=m1, "(2)"=m2, "(3)"=m3, "(4)"=m4, "(5)"=m5, "(6)"=m6),
coef_map = c("training" = "Training"),
conf_level = 0.95) +
geom_vline(xintercept = 0, linetype = "dashed", alpha = 0.5) +
labs(x = "Coefficient on training (95% CI)", y = "Specification",
title = "Effect of training across specifications") +
theme_classic(base_size = 11)
ggsave("figures/fig3_coefplot.pdf", width = 6, height = 4)
ggsave("figures/fig3_coefplot.png", width = 6, height = 4, dpi = 300)
# ============================================================
# 8g. FIGURE 2 — Event-study plot (dynamic DID, base period = -1)
# ============================================================
pdf("figures/fig2_event_study.pdf", width = 7, height = 4)
iplot(es,
xlab = "Years relative to treatment",
ylab = "Coefficient (ATT, 95% CI)",
main = "Event study: dynamic effect of training",
ref.line = -0.5)
dev.off()
png("figures/fig2_event_study.png", width = 2100, height = 1200, res = 300)
iplot(es,
xlab = "Years relative to treatment",
ylab = "Coefficient (ATT, 95% CI)",
main = "Event study: dynamic effect of training",
ref.line = -0.5)
dev.off()
# ============================================================
# 8h. FIGURE 4 — Sensitivity / robustness curve
# (HonestDiD / spec curve / forest of robustness battery)
# ============================================================
# HonestDiD example (after the event study with stored b/V):
library(HonestDiD)
honest_out <- createSensitivityResults(betahat = es$coefficients,
sigma = vcov(es),
numPrePeriods = 5, numPostPeriods = 5,
Mbarvec = seq(0, 0.5, by = 0.05))
sens_plot <- createSensitivityPlot(honest_out, originalResults = honest_out$mainResult)
ggsave("figures/fig4_sensitivity.pdf", plot = sens_plot, width = 7, height = 4)
ggsave("figures/fig4_sensitivity.png", plot = sens_plot, width = 7, height = 4, dpi = 300)
# Alternative — robustness forest plot:
# rob_summary <- imap_dfr(rob, ~ tibble(
# group = .y,
# est = coef(.x)[1],
# se = se(.x)[1]
# ))
# ggplot(rob_summary, aes(est, fct_rev(factor(group)))) +
# geom_point(size = 3, color = "navy") +
# geom_errorbarh(aes(xmin = est - 1.96*se, xmax = est + 1.96*se),
# height = 0.2, color = "navy") +
# geom_vline(xintercept = 0, linetype = "dashed") +
# labs(x = "Coefficient on training (95% CI)", y = NULL,
# title = "Robustness forest plot")
# ggsave("figures/fig4_sensitivity.pdf", width = 7, height = 4)
# ============================================================
# 8i. FIGURE 1 — Trend / motivation (treated vs control over time)
# ============================================================
df %>%
group_by(year, training) %>%
summarise(mean_log_wage = mean(log_wage, na.rm = TRUE), .groups = "drop") %>%
ggplot(aes(year, mean_log_wage, color = factor(training))) +
geom_line(linewidth = 1) + geom_point(size = 2) +
geom_vline(xintercept = policy_year, linetype = "dashed", color = "gray40") +
scale_color_manual(values = c("0" = "darkred", "1" = "navy"),
labels = c("Control", "Treated"), name = "") +
labs(x = "Year", y = "Mean log wage",
title = "Treated vs control trend") +
theme_classic(base_size = 11) +
theme(legend.position = "bottom")
ggsave("figures/fig1_trend.pdf", width = 7, height = 4)
ggsave("figures/fig1_trend.png", width = 7, height = 4, dpi = 300)
# ============================================================
# 8j. Auxiliary plots (optional — produce when relevant)
# ============================================================
library(binsreg)
binsreg(y = df$log_wage, x = df$tenure, w = df %>% select(age, edu, female))
ggsave("figures/figA_binscatter.pdf", width = 6, height = 4)
# RD plot (only when running_var exists)
# rdplot(y = df$outcome, x = df$running_var, c = 0,
# title = "RD plot", x.label = "Running variable", y.label = "Outcome")
# ============================================================
# 8k. Multi-panel combined (optional, for slides / appendix)
# ============================================================
library(cowplot)
# plot_grid(p_trend, p_event, p_coef, p_sens, ncol = 2, labels = "AUTO") %>%
# ggsave("figures/combined.pdf", plot = ., width = 10, height = 8)
# ============================================================
# 8l. Theme — set once at top of script for consistency
# ============================================================
theme_set(theme_classic(base_size = 11) +
theme(legend.position = "bottom",
plot.title = element_text(face = "bold")))
Deliverables checklist (verify before declaring the run complete):
[ ] tables/table1_balance.tex [ ] figures/fig1_trend.pdf
[ ] tables/table2_main.tex ★ [ ] figures/fig2_event_study.pdf
[ ] tables/table3_mechanism.tex [ ] figures/fig3_coefplot.pdf
[ ] tables/table4_heterogeneity.tex
[ ] tables/table5_robustness.tex [ ] figures/fig4_sensitivity.pdf
[ ] tables/tableA1_robustness.tex [ ] figures/fig5_spec_curve.pdf
[ ] artifacts/sample_construction.json (footnote 4)
[ ] artifacts/data_contract.json
[ ] artifacts/result.json (reproducibility stamp — see 8m)
The single artifact a journal's replication office (or a future co-author) needs to reproduce the headline number. Persist R version, seed, dataset hash, baseline coefficient + CI, and pointers to the protocol/contract:
library(jsonlite); library(digest)
# Get baseline result (assumes `base` is the headline feols object)
b_hat <- coef(base)["training"]
se_b <- se(base)["training"]
ci <- c(b_hat - 1.96 * se_b, b_hat + 1.96 * se_b)
stamp <- list(
R_version = R.version.string,
fixest_version = as.character(packageVersion("fixest")),
modelsummary_version = as.character(packageVersion("modelsummary")),
seed = 42,
dataset_sha256 = substr(digest::digest(df, algo = "sha256"), 1, 16),
n_obs = base$nobs,
estimand = "ATT",
estimator = "fixest::feols",
estimate = unname(b_hat),
se_cluster = unname(se_b),
ci95 = unname(ci),
pre_registration = "artifacts/strategy.md",
data_contract = "artifacts/data_contract.json",
sample_log = "artifacts/sample_construction.json",
paper_bundle = "tables/table2_main.tex"
)
write_json(stamp, "artifacts/result.json", pretty = TRUE, auto_unbox = TRUE)
Commit artifacts/result.json alongside the paper PDF. A referee should be able to run Rscript master.R and bit-identically reproduce this JSON.
When the user's wording flags Mode A (target-trial emulation / IPTW / TMLE / MR / STROBE / 流行病学 / 公共健康 / RWE / cohort), the 8 steps still apply — but Step 5 swaps the OLS-and-FE stack for the doubly-robust + survival + MR triplet, and the deliverables follow STROBE / TRIPOD-AI conventions. Steps 1–4 (cleaning, construction, Table 1, diagnostics) and Step 8 (tables/figures export) are identical to the Default mode.
Package footprint (install on top of the Default stack):
install.packages(c(
"WeightIt", "PSweight", "cobalt", # IPTW / propensity weighting + balance
"gfoRmula", # parametric g-formula (time-varying)
"tmle", "ltmle", # TMLE / longitudinal TMLE
"survival", "survminer", "flexsurv", # KM / Cox / AFT / RMST
"MendelianRandomization", "TwoSampleMR", # IVW, Egger, weighted-median MR
"MRPRESSO", # outlier-robust MR
"EValue" # E-value sensitivity (VanderWeele)
))
Write the protocol before touching the data. Save it as protocol.yml and quote it in the paper.
# protocol.yml — target-trial emulation skeleton
# eligibility: age 40-75, no_prior_event, ascertained_at t0
# treatment: A=1 statin initiation; A=0 no initiation
# assignment: emulated random at t0 via IPTW on baseline covariates
# outcome: incident MI within 5 years
# estimand: ITT ATE on risk difference + hazard ratio
library(dplyr)
cohort <- df |>
filter(age >= 40, age <= 75, prior_MI == 0) |>
mutate(
t0 = coalesce(statin_initiation_date, enrollment_date),
event_5y = as.integer((MI_date - t0) <= 365 * 5 & !is.na(MI_date)),
time_at_risk = pmin(as.numeric(censor_date - t0), 365 * 5)
)
Use the same gtsummary::tbl_summary from Step 3, just by = A. E-values for unmeasured confounding go in the footer.
library(gtsummary)
cohort |>
select(A, age, edu, smoke, bmi, ldl, sbp) |>
tbl_summary(by = A, missing = "ifany") |>
add_difference() |>
add_p() |>
bold_labels()
library(WeightIt); library(cobalt)
# Estimate PS + IPTW weights
w_out <- weightit(A ~ age + edu + smoke + bmi + ldl + sbp,
data = cohort, method = "glm", estimand = "ATE")
# Overlap density (positivity)
bal.plot(w_out, var.name = "prop.score", which = "both")
ggsave("figures/figA2_ps_overlap.pdf")
# Love plot (SMDs before vs after IPTW)
love.plot(w_out, threshold = 0.1, abs = TRUE)
ggsave("figures/figA2_love.pdf")
The "AER Table 2" of epi: a 3-column table where each column is one of {IPTW-MSM, g-formula, TMLE}, so the reader can confirm doubly-robust agreement.
# IPTW marginal structural model
library(survey)
des <- svydesign(ids = ~1, data = cohort, weights = w_out$weights)
msm <- svyglm(event_5y ~ A, design = des, family = quasibinomial())
RD_iptw <- coef(msm)["A"]; CI_iptw <- confint(msm)["A", ]
# g-formula (parametric, time-fixed)
library(gfoRmula)
gf <- gformula_binary_eof(
obs_data = cohort,
id = "subject_id", time_name = "t", outcome_name = "event_5y",
covnames = c("age","edu","smoke","bmi","ldl","sbp"),
intvars = list("A"), interventions = list(list(c(static, 1)), list(c(static, 0))),
ref_int = 1, time_points = 1, basecovs = c("age","edu","smoke","bmi","ldl","sbp")
)
# TMLE (doubly robust)
library(tmle)
fit_tmle <- tmle(
Y = cohort$event_5y, A = cohort$A,
W = cohort[, c("age","edu","smoke","bmi","ldl","sbp")],
family = "binomial",
Q.SL.library = c("SL.glm","SL.glmnet","SL.ranger"),
g.SL.library = c("SL.glm","SL.glmnet","SL.ranger")
)
RD_tmle <- fit_tmle$estimates$ATE$psi
CI_tmle <- fit_tmle$estimates$ATE$CI
# Stack the triplet into one paper table
library(modelsummary)
tableA3 <- tibble::tribble(
~Estimator, ~RD, ~`95% CI`,
"IPTW-MSM", RD_iptw, sprintf("[%.3f, %.3f]", CI_iptw[1], CI_iptw[2]),
"g-formula", gf$result[2,"mean"] - gf$result[1,"mean"], "—",
"TMLE", RD_tmle, sprintf("[%.3f, %.3f]", CI_tmle[1], CI_tmle[2])
)
modelsummary::datasummary_df(tableA3, output = "tables/tableA3_dr_triplet.tex")
library(survival); library(survminer); library(flexsurv)
# KM by treatment
fit_km <- survfit(Surv(time_at_risk, event_5y) ~ A, data = cohort)
ggsurvplot(fit_km, conf.int = TRUE, pval = TRUE, risk.table = TRUE)
ggsave("figures/figA4_km.pdf")
# Cox HR (covariate-adjusted)
fit_cox <- coxph(Surv(time_at_risk, event_5y) ~ A + age + edu + smoke + bmi + ldl + sbp,
data = cohort, weights = w_out$weights)
HR <- exp(coef(fit_cox)["A"]); HR_CI <- exp(confint(fit_cox)["A", ])
# AFT (Weibull) for time-ratio interpretation
fit_aft <- flexsurvreg(Surv(time_at_risk, event_5y) ~ A + age + edu + smoke + bmi + ldl + sbp,
data = cohort, dist = "weibull")
# RMST contrast at t = 5 years
library(survRM2)
rmst <- rmst2(cohort$time_at_risk, cohort$event_5y, cohort$A, tau = 365 * 5)
library(MendelianRandomization)
mri <- mr_input(bx = BX, bxse = BXSE, by = BY, byse = BYSE,
exposure = "Statin use", outcome = "MI")
ivw <- mr_ivw(mri)
egger <- mr_egger(mri) # pleiotropy intercept test
wmedian<- mr_median(mri, weighting = "weighted")
# Or harmonized two-sample workflow
# library(TwoSampleMR); harmonised <- harmonise_data(exposure_dat, outcome_dat)
# res <- mr(harmonised, method_list = c("mr_ivw", "mr_egger_regression", "mr_weighted_median"))
# Sensitivity to outliers
library(MRPRESSO)
mr_presso(BetaOutcome = "by", BetaExposure = "bx", SdOutcome = "byse", SdExposure = "bxse",
OUTLIERtest = TRUE, DISTORTIONtest = TRUE, data = data.frame(bx, by, bxse, byse), NbDistribution = 1000)
library(EValue)
ev <- evalue(RR(1.45), lo = 1.10, hi = 1.91) # required strength of unmeasured confounding
print(ev)
Save as replication/strobe_checklist.md and tick before submission:
[ ] Eligibility criteria + dates (target-trial protocol)
[ ] Adjustment set with DAG justification (A.2)
[ ] Positivity / overlap diagnostic (A.2)
[ ] Doubly-robust triplet (IPTW + g-formula + TMLE) (A.3)
[ ] Risk difference + hazard ratio + RMST (A.3, A.4)
[ ] E-value for unmeasured confounding (A.6)
[ ] Loss-to-follow-up rate + censoring assumption (A.0)
[ ] Pre-registered protocol or analysis plan (A.0)
When the user's wording flags Mode B (DML / meta-learner / causal forest / BCF / CATE / policy learning / conformal causal / fairness / 因果机器学习), the pipeline keeps Steps 1–4 and Step 8 from the Default mode, swaps Step 5 for the ML estimator stack, and adds a CATE-distribution + policy-value layer between Step 7 and Step 8.
Package footprint (install on top of the Default stack):
install.packages(c(
"DoubleML", "mlr3", "mlr3learners", # DML + ML nuisance learners
"grf", # causal forest, GRF, instrumental forest
"causalweight", # IPW / DR / sensitivity for CATE
"bartCause", "bcf", # BART / Bayesian causal forest
"policytree", # honest policy trees
"conformalInference", # conformal prediction (general)
# cfcausal — install via devtools::install_github("lihualei71/cfcausal")
"fairmodels", # fairness audit
"pcalg", "bnlearn" # causal discovery (PC / GES / Bayesian net)
))
library(mlr3); library(mlr3learners); library(DoubleML)
set.seed(42)
idx <- sample(seq_len(nrow(df)), size = 0.7 * nrow(df))
train <- df[idx, ]; holdout <- df[-idx, ]
# Standard nuisance pair: outcome regression Q(X,A) and propensity g(A|X)
ml_g <- lrn("regr.ranger", num.trees = 500, mtry = 5) # outcome
ml_m <- lrn("classif.ranger", num.trees = 500, mtry = 5) # propensity
library(pcalg)
# PC algorithm — constraint-based DAG discovery
suffStat <- list(C = cor(df[, c("A","Y","X1","X2","X3","X4")]), n = nrow(df))
pc.fit <- pc(suffStat, indepTest = gaussCItest,
alpha = 0.01, labels = c("A","Y","X1","X2","X3","X4"))
plot(pc.fit, main = "PC-recovered DAG")
# OR: bnlearn for hill-climbing GES
# library(bnlearn); hc.fit <- hc(df[, c("A","Y","X1","X2","X3","X4")]); plot(hc.fit)
The "AER Table 2" of ML causal: a horse-race table where each column is one estimator family on the same (Y, A, X) data — readers want to see DML, T-learner, causal forest, and BCF all agree (or disagree) on the ATE.
# DML — partially linear or interactive regression model
dml_data <- DoubleMLData$new(train, y_col = "Y", d_cols = "A",
x_cols = c("X1","X2","X3","X4"))
dml_plr <- DoubleMLPLR$new(dml_data, ml_g = ml_g, ml_m = ml_m, n_folds = 5)
dml_plr$fit()
ate_dml <- dml_plr$coef; ci_dml <- dml_plr$confint()
# Causal forest (GRF) — non-parametric CATE
library(grf)
cf <- causal_forest(X = as.matrix(train[, c("X1","X2","X3","X4")]),
Y = train$Y, W = train$A, num.trees = 2000)
ate_cf <- average_treatment_effect(cf, target.sample = "all")
cate_cf <- predict(cf, newdata = as.matrix(holdout[, c("X1","X2","X3","X4")]))$predictions
# T-learner / DR-learner (use causalweight or hand-rolled with grf::*)
library(causalweight)
dr <- treatDML(y = train$Y, d = train$A, x = as.matrix(train[, c("X1","X2","X3","X4")]),
MLmethod = "lasso")$effect
ate_DR <- mean(dr)
# Bayesian Causal Forest — separate prognostic + treatment functions
library(bcf)
bcf_fit <- bcf(y = train$Y, z = train$A,
x_control = as.matrix(train[, c("X1","X2","X3","X4")]),
x_moderate = as.matrix(train[, c("X1","X2","X3","X4")]),
pihat = predict(glm(A ~ ., data = train[, c("A","X1","X2","X3","X4")], family = binomial), type = "response"),
nburn = 1000, nsim = 1000)
ate_bcf <- mean(bcf_fit$tau)
# Stack the horse-race
library(modelsummary)
tableB2 <- tibble::tribble(
~Estimator, ~ATE,
"DML (PLR)", ate_dml[1],
"Causal Forest", ate_cf[1],
"DR-learner", ate_DR,
"Bayesian Causal Forest", ate_bcf
)
modelsummary::datasummary_df(tableB2, fmt = 4, output = "tables/tableB2_ml_horserace.tex")
library(ggplot2)
# CATE histogram
data.frame(cate = cate_cf) |>
ggplot(aes(x = cate)) +
geom_histogram(bins = 30, fill = "grey70", colour = "black") +
geom_vline(xintercept = 0, lty = 2) +
labs(x = "CATE", y = "Count")
ggsave("figures/figB3_cate_hist.pdf")
# CATE by quartile of a covariate
holdout |>
mutate(cate = cate_cf, age_q = ntile(X1, 4)) |>
group_by(age_q) |>
summarise(mean_cate = mean(cate)) |>
ggplot(aes(age_q, mean_cate)) + geom_col() + labs(y = "Mean CATE")
ggsave("figures/figB3_cate_by_age_q.pdf")
library(policytree)
# Honest discrete policy tree on doubly-robust scores from causal forest
dr_scores <- double_robust_scores(cf)
ptree <- policy_tree(X = as.matrix(train[, c("X1","X2","X3","X4")]),
Gamma = dr_scores, depth = 3)
print(ptree) # human-readable tree of "treat if X1<a and X2>b"
plot(ptree)
ggsave("figures/figB4_policy_tree.pdf")
# Off-policy evaluation — DR policy value on holdout
holdout_X <- as.matrix(holdout[, c("X1","X2","X3","X4")])
pred_pol <- predict(ptree, holdout_X)
DR_holdout <- double_robust_scores(cf, newdata = holdout_X)
policy_value_DR <- mean(DR_holdout[cbind(seq_len(nrow(DR_holdout)), pred_pol)])
cat(sprintf("DR policy value (holdout): %.3f\n", policy_value_DR))
# Conformal prediction interval around CATE (split conformal via cfcausal)
# devtools::install_github("lihualei71/cfcausal")
library(cfcausal)
ci90 <- conformalIte(X = as.matrix(train[, c("X1","X2","X3","X4")]),
Y = train$Y, T = train$A,
alpha = 0.1,
algo = "nest",
type = "CQR",
X.test = as.matrix(holdout[, c("X1","X2","X3","X4")]))
# Fairness audit — disparate impact / equalised odds
library(fairmodels)
fobject <- fairness_check(model_treated = predict(ptree, holdout_X),
data = holdout, protected = holdout$sensitive_attr,
privileged = "majority")
plot(fobject)
ggsave("figures/figB5_fairness.pdf")
Save as replication/ml_causal_checklist.md:
[ ] Nuisance learners listed (Q model, g model, hyperparameters, CV folds)
[ ] Cross-fitting / sample-splitting documented (DML K-fold)
[ ] Overlap / propensity diagnostics (B.0 + A.2-style overlap plot)
[ ] CATE summary (mean, SD, quartiles) + heterogeneity p-value (grf::test_calibration)
[ ] Policy value with confidence interval (B.4)
[ ] Conformal coverage rate on holdout (B.5)
[ ] Fairness gaps across sensitive attributes (B.5)
[ ] DAG / adjustment set + sensitivity to unmeasured confounding (E-value or Manski bounds)
| Step | Task | Go-to package | Fallback |
|------|------|---------------|----------|
| 1 | Read data | haven / readr / readxl / data.table::fread | arrow for Parquet |
| 1 | Clean names | janitor::clean_names | manual |
| 1 | Missing | naniar / mice | Hmisc |
| 2 | Winsorize | DescTools::Winsorize | manual pmin/pmax |
| 2 | Lag in panel | dplyr::lag (with arrange+group_by) | data.table::shift |
| 3 | Table 1 | gtsummary / modelsummary::datasummary_balance | tableone |
| 3 | Correlation | psych::corr.test + corrplot | Hmisc::rcorr |
| 4 | Hetero / autocorr | lmtest::bptest / dwtest / bgtest | car |
| 4 | Panel tests | plm::pbgtest / pcdtest / phtest | — |
| 4 | Stationarity | tseries::adf.test / tseries::kpss.test | urca |
| 5 | OLS / panel FE | fixest::feols | lfe::felm (older) |
| 5 | IV | fixest::feols(\| ~ ) | AER::ivreg / ivreg::ivreg |
| 5 | DID — 2×2 | feols with i(treated, post) | — |
| 5 | DID — CS | did::att_gt | — |
| 5 | DID — SA | fixest::sunab | — |
| 5 | DID — BJS | didimputation::did_imputation | — |
| 5 | DID — SDID | synthdid | — |
| 5 | RD | rdrobust / rddensity / rdmulti | — |
| 5 | SC | Synth / gsynth / tidysynth | — |
| 5 | PSM | MatchIt::matchit | — |
| 5 | IPW | WeightIt::weightit | — |
| 5 | Entropy balance | ebal | — |
| 5 | DML | DoubleML | — |
| 5 | CATE (causal forest) | grf::causal_forest | — |
| 5 | Mediation | mediation::mediate | lavaan |
| 6 | Wild cluster boot | fwildclusterboot::boottest | clubSandwich |
| 6 | Random. inference | ri2::conduct_ri | manual boot |
| 6 | Multiple testing | multcomp / hand-roll Romano-Wolf | — |
| 6 | TWFE diagnosis | bacondecomp::bacon | — |
| 6 | PT sensitivity | HonestDiD | — |
| 6 | Oster δ* | robomit::o_test / o_beta | — |
| 7 | Margins / slopes | marginaleffects::avg_slopes / plot_slopes | — |
| 7 | Mediation w/ sensitivity | mediation::mediate + medsens | — |
| 7 | SEM | lavaan::sem | — |
| 8 | Reg table (any format) | modelsummary | texreg / stargazer |
| 8 | Word table | flextable / gt::gtsave | officer |
| 8 | LaTeX table styling | kableExtra | — |
| 8 | Coefplot / event study | modelplot / fixest::iplot | ggplot2 manual |
| 8 | Binscatter | binsreg | — |
| 8 | Multi-panel | cowplot::plot_grid / patchwork | gridExtra |
| Mistake | Correct approach |
|---------|------------------|
| lm(y ~ x + factor(unit) + factor(year)) on big panels | feols(y ~ x | unit + year, data) |
| Default iid SEs on clustered data | feols(..., cluster = ~ id); boottest if clusters < 50 |
| TWFE on staggered adoption | did::att_gt / fixest::sunab / didimputation::did_imputation |
| Using lag(x) without arrange() + group_by() | always arrange(id, time) %>% group_by(id) %>% mutate(x_l1 = lag(x)) |
| Joining without checking row count | use relationship arg in dplyr::*_join, then stopifnot(nrow(df) == n_before) |
| Interpreting logit coefficients directly | marginaleffects::avg_slopes(model) for AME |
| Reporting only point estimates | always plot — modelplot, iplot, plot_slopes |
| Manually formatting reg tables | modelsummary writes LaTeX/Word/HTML in one call |
| Reporting only the headline coefficient (no Table 2) | Always ship the multi-column M1→M6 main table — that is the centerpiece of an economics paper, not the abstract sentence |
| Coefficient table without any figures | An economics result needs at least F1 trend + F2 event study + F3 coefplot + F4 sensitivity — see the Default Output Spec |
| Saving plots as .png only | also .pdf for LaTeX submissions |
| Hard-coding dataset paths in scripts | use here::here() and renv::init() |
| Running tests manually each time | wrap into targets::tar_make() or Quarto |
project/
├── R/
│ ├── 01_clean.R # produces data/analysis.rds
│ ├── 02_transform.R
│ ├── 03_describe.R
│ ├── 04_diagnose.R
│ ├── 05_model.R # saves models to estimates/
│ ├── 06_robust.R
│ ├── 07_further.R
│ └── 08_tables_figures.R
├── data/
│ ├── raw/
│ └── analysis.rds
├── tables/
├── figures/
├── estimates/ # saved fixest objects via saveRDS
├── logs/
├── renv.lock # package versions locked
├── _targets.R # or main.qmd / main.R
└── README.md
_targets.R (using targets package) or main.qmd (Quarto) at the top makes the whole pipeline reproducible:
# main.R — minimal driver
source("R/01_clean.R")
source("R/02_transform.R")
source("R/03_describe.R")
source("R/04_diagnose.R")
source("R/05_model.R")
source("R/06_robust.R")
source("R/07_further.R")
source("R/08_tables_figures.R")
For Quarto authoring (combined narrative + code + tables/figures, render to PDF/HTML/Word), see references/08-tables-plots.md §12.
modelsummary(...) and fixest::etable(...) are the two primitives behind every multi-regression table. The eight patterns above map to:
| Pattern | What varies across columns | Step |
|---|---|---|
| A. Progressive controls | covariate set / FE depth | 5.A — Table 2 |
| B. Design horse race | identification strategy (OLS / IV / DID / DML / PSM) | 5.B — Table 2-bis |
| C. Multi-outcome | dependent variable Y | 5.C — Table 2-ter |
| D. Stacked Panel A / B | horizon / sample (panel rows × spec columns) | 5.D — Table 2-quater |
| E. IV reporting triplet | first stage / reduced form / 2SLS | 5.E — Table 2-quinto |
| F. Causal-orchestrator | 1 column, full diagnostics (att_gt / synthdid / causal_forest) | 5.F |
| G. Subgroup table | subsample (full / female / male / Q1…Q4) | 7 — Table 3 |
| H. Robustness master | every robustness check stacked | 6.j — Table A1 |
Default modelsummary settings for AER house style:
modelsummary(
list("(1)" = m1, ..., "(N)" = mN),
output = "tables/tableN.tex", # or .docx / .html
stars = c("*" = 0.1, "**" = 0.05, "***" = 0.01), # AER stars
gof_omit = "BIC|AIC|F|Log|Adj",
coef_map = c("training" = "Training"), # pretty names
notes = c("Cluster-robust SE in parentheses.",
"* p<0.10, ** p<0.05, *** p<0.01.")
)
# For multi-panel paper bundles, use gt::gt_group(modelsummary(...), modelsummary(...))
# or render via Quarto for a single .pdf / .docx / .html target.
| # | Figure | R commands | Section |
|---|---|---|---|
| 1a | Raw trends (DID Figure 1) | df %>% group_by(year, treat) %>% summarise(mean(y)) %>% ggplot() | §1 |
| 1b | Treatment rollout heatmap | panelView::panelview(...) · ggplot + geom_tile | §1 |
| 2a | Event-study coefficients | fixest::iplot(feols(y ~ sunab(G, t) | i + t)) | §3 (Step 3.5.1) |
| 2a' | Bacon weights | bacondecomp::bacon + ggplot | §3 |
| 2a'' | CS-DID dynamic effects | did::ggdid(aggte(cs, type="dynamic")) | §3 |
| 2b | First-stage scatter | binsreg::binsreg(y=D, x=Z, w=X) | §3 (Step 3.5.2) |
| 2c | RD canonical plot | rdrobust::rdplot(y, x, c=0) | §3 (Step 3.5.3) |
| 2c' | McCrary density | rddensity::rdplotdensity(rdd, X) | §3 |
| 2d | Matching love plot | cobalt::love.plot(MatchIt::matchit(...)) | §3 (Step 3.5.4) |
| 2e | SCM trajectory | tidysynth::plot_trends · synthdid::plot · Synth::path.plot | §3 (Step 3.5.5) |
| 3 | Coefficient plot of main specs | modelsummary::modelplot(list(m1,...,m6), coefs="training") | §4 |
| 4a | Dose-response | marginaleffects::plot_predictions(model, condition="dose") | §5 |
| 4b | CATE distribution | grf::causal_forest(...) + ggplot::geom_histogram(predict(cf)$predictions) | §5 |
| 5 | Specification curve | specr::plot(specr(...)) (see 6.k) | §7 |
| 6 | Sensitivity dashboard | HonestDiD::createSensitivityPlot · EValue::evalue | §7 (Step 6.l) |
| 7 | Final main figure | estimator-specific (rdplot, iplot, Synth::path.plot) | §8 |
Every figure is exported via
ggsave()as both.png ≥ 300 dpi(for slides / web). Settheme_set(theme_classic(base_size = 11))once at the top ofmaster.Rfor consistent styling.
library(fixest); library(plm); library(sandwich); library(lmtest)
feols(y ~ X, data = df, cluster = ~ i) # OLS (modern primary)
feols(y ~ X | fe1, data = df, cluster = ~ i) # OLS + 1 FE
feols(y ~ X | fe1 + fe2, data = df, cluster = ~ i) # HD FE workhorse
feols(y ~ X | fe1 + fe2, data = df, cluster = ~ fe1 + fe2) # 2-way cluster
fepois(count ~ X | fe1 + fe2, data = df, cluster = ~ i) # Poisson + FE
feglm (y ~ X | fe1, data = df, family = binomial(link = "logit"),
cluster = ~ i) # Logit + FE
plm (y ~ X, data = df, model = "within", index = c("i","t")) # panel FE
plm (y ~ X, data = df, model = "random", index = c("i","t")) # RE (Hausman: phtest)
library(fixest); library(did); library(didimputation); library(synthdid); library(bacondecomp); library(HonestDiD); library(DIDmultiplegtDYN)
feols(y ~ i(treated, post, ref = 0) | i + t, df, cluster = ~ i) # 2×2
feols(y ~ sunab(first_treat, year) | i + year, df, cluster = ~ i) # SA event study
att_gt(yname="y", tname="t", idname="i", gname="G", data=df,
control_group="nevertreated", est_method="dr", clustervars="i") # CS-DID
did_imputation(data=df, yname="y", gname="G", tname="t", idname="i",
horizon=0:5, pretrends=-5:-1, cluster_var="i") # BJS imputation
DIDmultiplegtDYN(df, "y", "i", "t", "training", effects=5, placebo=3) # de Chaisemartin
synthdid_estimate(panel.matrices(df,"i","t","y","training"), ...) # synthetic DID
bacon(y ~ training, data=df, id_var="i", time_var="t") # TWFE diagnostic
HonestDiD::createSensitivityResults(...) # PT sensitivity
library(fixest); library(AER); library(ivreg)
feols(y ~ X | D ~ Z, df, cluster = ~ firm_id) # workhorse w/ HD FE
fitstat(iv, ~ ivf + ivwald + sargan + cd) # CD/KP/Sargan/F
AER::ivreg(y ~ D + X | Z + X, data = df) # classic API
summary(iv, vcov. = sandwich, diagnostics = TRUE) # with diagnostics
library(rdrobust); library(rddensity); library(rdmulti)
rdrobust(y, x, c = 0, kernel = "triangular", bwselect = "mserd") # Sharp RD
rdrobust(y, x, c = 0, fuzzy = D) # Fuzzy RD
rddensity(X = x, c = 0) # McCrary density
rdplot(y, x, c = 0)
rdmc(y, x, cutoffs = c(0, 5, 10)) # multi-cutoff
library(MatchIt); library(WeightIt); library(cobalt)
matchit (D ~ X1 + X2, data = df, method = "nearest", ratio = 1) # PSM
matchit (D ~ X1 + X2, data = df, method = "cem") # Coarsened EM
weightit(D ~ X1 + X2, data = df, method = "ebal") # entropy balancing
weightit(D ~ X1 + X2, data = df, method = "ps", estimand = "ATE") # IPW
love.plot(matchit_obj, threshold = 0.10) # SMD diagnostic
library(Synth); library(gsynth); library(tidysynth); library(synthdid)
Synth::synth(...) # ADH SCM
gsynth(y ~ training, data = df, index = c("i","t"), force = "two-way") # generalized SC
synthdid_estimate(panel.matrices(...)) # synthetic DID
tidysynth::synthetic_control(df, ...) %>% generate_predictor(...) %>%
generate_weights() %>% generate_control()
library(grf); library(DoubleML); library(mlr3); library(causalDML)
causal_forest(X, Y, W, num.trees = 4000, honesty = TRUE) # GRF causal forest
DoubleML::DoubleMLPLR$new(data, ml_l = lrn("regr.ranger"),
ml_m = lrn("regr.ranger")) # DML PLR
DoubleML::DoubleMLIRM$new(data, ...) # DML interactive
predict(cf)$predictions # CATE per row
average_treatment_effect(cf, target.sample = "treated")
test_calibration(cf); variable_importance(cf)
policytree::policy_tree(X, gamma, depth = 3) # policy tree
library(fwildclusterboot); library(ri2); library(multcomp); library(robomit); library(EValue)
boottest(model, param = "training", clustid = "state", B = 9999) # wild cluster bootstrap
ri2::conduct_ri(...) # randomization inference
robomit::o_test(...) # Oster δ
EValue::evalue(RR(1.45), lo = 1.10, hi = 1.91) # E-value
fwildclusterboot::boottest(..., type = "rademacher") # alt bootstrap dist
library(survival); library(survminer); library(survRM2); library(ipw); library(tmle); library(zelig)
survfit(Surv(time, event) ~ A, data = df) # KM
coxph (Surv(time, event) ~ A + X, data = df) # Cox
survreg(Surv(time, event) ~ A + X, data = df, dist = "weibull") # AFT
rmst2 (time, status, arm, tau = 1825) # RMST contrast
ipw::ipwpoint(...) # IPTW
tmle (Y, A, W = X, ...) # TMLE
gfoRmula::gformula_survival(...) # parametric g-formula
TwoSampleMR::mr(...) # Mendelian randomization
import statspai as sp) → 00-StatsPAI_skill.00.1-Full-empirical-analysis-skill..do pipeline → 00.2-Full-empirical-analysis-skill_Stata.10-Jill0099-causal-inference-mixtape.brms/rstan/cmdstanr) → 23-Learning-Bayesian-Statistics-baygent-skills.This skill ends at Step 8 — .tex / .docx tables and .pdf figures. Paper drafting is out of scope.
development
Conduct rigorous thematic analysis (TA) of qualitative data following Braun and Clarke's (2006) six-phase framework. Use whenever the user mentions 'thematic analysis', 'TA', 'Braun and Clarke', 'qualitative coding', 'identifying themes', or asks for help analysing interviews, focus groups, open-ended survey responses, or transcripts to identify patterns. Also trigger for questions about inductive vs theoretical coding, semantic vs latent themes, essentialist vs constructionist epistemology, building a thematic map, or writing up a qualitative findings section. Covers all six phases, the four upfront analytic decisions, the 15-point quality checklist, and the five common pitfalls. Produces a Word document write-up and an annotated thematic map. Does NOT cover IPA, grounded theory, discourse analysis, conversation analysis, or narrative analysis — use a different method for those.
development
Guide users through writing a systematic literature review (SLR) following the PRISMA 2020 framework. Use this skill whenever the user mentions 'systematic review', 'systematic literature review', 'SLR', 'PRISMA', 'PRISMA 2020', 'PRISMA flow diagram', 'PRISMA checklist', or asks for help writing, structuring, or auditing a literature review that follows reporting guidelines. Also trigger when the user asks about inclusion/exclusion criteria for a review, search strategies for databases like Scopus/WoS/PubMed, study selection processes, risk of bias assessment, or narrative synthesis for a review paper. This skill covers the full PRISMA 2020 checklist (27 items), produces a Word document manuscript in strict journal article format, generates an annotated PRISMA flow diagram, and enforces APA 7th Edition referencing throughout. It does NOT cover meta-analysis or statistical pooling. By Chuah Kee Man.
testing
Performs placebo-in-time sensitivity analysis with hierarchical null model and optional Bayesian assurance. Use when checking model robustness, verifying lack of pre-intervention effects, or estimating study power.
data-ai
Fit, summarize, plot, and interpret a chosen CausalPy experiment. Use after the causal method has been selected, including when configuring PyMC/sklearn models and scale-aware custom priors.