AER Identification

Overview

In AER-track empirical economics, identification is the paper. This skill routes among canonical designs, modern defaults, and referee-facing diagnostics.

If the design is fragile, return to aer-topic-selection; writing cannot save it.

When to Use

Designing the empirical strategy for a new project
The current strategy is TWFE / first-stage F / naive RDD and the referee will flag it
A prior submission was rejected on identification grounds and the design needs rebuilding
Choosing between two candidate identification strategies for the same question

Master Decision Tree

Is treatment assignment plausibly random conditional on observables?
├── Yes, by design (RCT, lottery) → run the RCT analysis; register PAP via AEA RCT Registry
└── No → identification must come from variation
    ├── Sharp threshold in a running variable → RDD (sharp or fuzzy)
    ├── Discrete policy change in some units, not others, over time → DiD
    │     ├── Single treatment date → canonical 2×2 DiD
    │     └── Staggered adoption → Callaway-Sant'Anna or Borusyak-Jaravel-Spiess
    ├── Endogenous regressor + plausibly exogenous shifter → IV
    │     ├── Shifter × pre-existing exposure shares → shift-share / Bartik
    │     └── Single instrument → weak-IV-robust inference if F < 50
    ├── One treated unit / aggregate intervention → synthetic control
    └── None of the above → reconsider the question

Difference-in-Differences

Canonical 2×2 (single treatment date, two groups)

Use TWFE if and only if:

Treatment timing is simultaneous for all treated units
The control group is never treated
Treatment-effect heterogeneity is implausible

Otherwise, TWFE produces biased and often sign-flipped estimates.

Staggered Adoption (most modern applications)

Do not use TWFE. Use one of:

Callaway and Sant'Anna (2021) — csdid (Stata), did (R). Identifies group-time average treatment effects (ATT(g,t)); estimands are doubly robust; supports event-study aggregation.
Borusyak, Jaravel, and Spiess (2024) — imputation estimator.
de Chaisemartin and D'Haultfœuille (2020) — did_multiplegt.
Sun and Abraham (2021) — interaction-weighted estimator for event studies.

Required diagnostics:

Goodman-Bacon decomposition to show the share of weight from "forbidden" comparisons under TWFE
Event-study plot with the imputation or Callaway-Sant'Anna estimator
Pre-trends test reported as the joint test, not just the visual
Heterogeneity by treatment cohort

Pre-Trends

A flat pre-trend is necessary but not sufficient. Report:

Visual event-study plot with 95% confidence intervals
Formal joint test of pre-period coefficients (p-value)
Honest DiD (Rambachan-Roth 2023) sensitivity bounds for the post-period

Instrumental Variables

Weak Instruments

The first-stage F > 10 rule is obsolete. Modern conventions:

Just-identified models: report Anderson-Rubin (AR) confidence sets as primary inference; AR keeps size under weak instruments.
For F < 50: 2SLS confidence intervals are unreliable; AR is required, not optional.
Stock-Yogo TSLS-bias critical values assume homoskedasticity and rarely fit clustered settings.

Use weakivtest (Stata), ivDiag (R), or the Olea-Pflueger effective F statistic.

Exclusion Restriction

The IV's credibility depends on a story, not a test. State the exclusion restriction in one sentence in the introduction and defend it with:

Institutional narrative (one paragraph)
A placebo regression where the instrument predicts an outcome it should not affect
Sensitivity analysis: how much exclusion-restriction violation would overturn the result (Conley et al. 2012)

Shift-Share / Bartik

Two valid sources of identification, with very different implications:

Exogenous shares (Goldsmith-Pinkham, Sorkin, Swift 2020) — argue that pre-existing exposure shares are conditionally exogenous; report the Rotemberg weights and inspect the top-5 industries driving identification.
Exogenous shocks (Borusyak, Hull, Jaravel 2022; Adão, Kolesár, Morales 2019) — argue that aggregate shocks are as-good-as-random; report shock-level inference.

Pick one explicitly. Do not hand-wave between the two.

Regression Discontinuity

Modern Defaults

Local linear regression with a triangular kernel. Polynomials of order > 1 are discouraged (Gelman-Imbens 2019).
MSE-optimal bandwidth (Calonico-Cattaneo-Titiunik 2014) with the robust bias-corrected confidence interval. Use rdrobust.
Donut RDD if bunching near the cutoff is a concern.
Covariate adjustment for efficiency; main result must hold without it.

Required Diagnostics

McCrary (2008) / Cattaneo-Jansson-Ma (2020) density test for manipulation of the running variable
Balance tests on predetermined covariates at the cutoff
Placebo cutoffs away from the true threshold
Bandwidth sensitivity — show the estimate across at least three bandwidths
Visual RD plot using rdplot with the binning method explicitly stated

Synthetic Control

When Appropriate

One (or few) treated units
Long pre-treatment outcome series (≥ 10 periods)
A large donor pool of plausibly comparable untreated units
Aggregate intervention (policy at the country, state, city level)

Modern Extensions

Generalized synthetic control (Xu 2017) for multiple treated units
Augmented synthetic control (Ben-Michael, Feller, Rothstein 2021) for bias correction
Synthetic DiD (Arkhangelsky et al. 2021) combining SCM and DiD weighting

Required Diagnostics

Placebo (in-time): apply SCM to pre-treatment fake intervention dates
Placebo (in-space): apply SCM to every donor as if it were treated; report the distribution of placebo effects
Permutation inference / Fisher exact p-value
Weight vector reported in the appendix; donors with > 10% weight discussed

Field Experiments and RCTs

If the paper uses a field experiment:

Register with AEA RCT Registry before the intervention begins. AEA journals require this prior to submission.
Pre-analysis plan (PAP) posted before unblinding. Per Olken and others, keep the PAP moderate in scope — pre-specify primary outcomes and the analysis specification, leave exploratory work clearly labeled as such.
Power calculations in the manuscript or appendix.
Multiple-hypothesis correction if more than one primary outcome.
Attrition documented and tested for differential attrition by treatment arm.

Mechanism vs. Identification

A common confusion: identification answers whether X causes Y; mechanism answers why. Mechanism evidence should not weaken the identification of the main effect. Run:

Subgroup heterogeneity (does the effect concentrate where theory predicts?)
Mediation analysis only if the mediator is itself plausibly exogenous (rare)
Auxiliary outcomes consistent with the proposed channel

Red Flags for Referees

TWFE on staggered data with no Goodman-Bacon decomposition
First-stage F = 12 cited as evidence of instrument strength
RDD with a polynomial of order 4
Synthetic control with no placebo inference
DiD with a "control group" of eventually-treated units
IV exclusion restriction defended only by "we control for X"
Quoting an Angrist-Pischke citation as a substitute for showing the diagnostic

StatsPAI Tool Bindings

When a StatsPAI MCP server is connected, select the validated tool, then let it run the estimator — do not hand-roll the design. The methodological choice still comes from the decision tree above; this table is only the execution surface. The full registry and chained workflow live in skills/aer-statspai/SKILL.md.

| Design / diagnostic | Call (StatsPAI) | Do not hand-roll | |---|---|---| | Staggered DiD ATT(g,t) | callaway_santanna then aggte | a pooled two-way fixed-effects regression | | Imputation / dCDH variants | did_imputation, sun_abraham, did_multiplegt | event-study leads and lags assembled by hand on staggered data | | Forbidden-comparison weight | bacon_decomposition | eyeballing whether TWFE is "probably fine" | | Pre-trends (joint, not visual) | event_study, pretrends_test | declaring parallel trends from a plot alone | | IV under weak instruments | ivreg with anderson_rubin_ci, effective_f_test | a first-stage-F-only 2SLS table | | Shift-share / Bartik | bartik | a Bartik IV with no Rotemberg-weight report | | RDD (local-linear, RBC) | rdrobust, rdbwselect, rdplot, rddensity | a high-order global-polynomial RDD | | Synthetic control | synth, augsynth, gsynth, sdid, synth_time_placebo, synth_loo | SCM with no placebo or leave-one-out inference | | Design / estimator selection | detect_design, preflight, recommend | guessing the estimator before profiling the data |

Repository Resources

Bundled with the installed skill, no repository checkout needed --- read it before the repo resources below:

references/estimator-playbook.md --- per-design estimands, modern defaults, diagnostics, and referee-objection response scripts

When working from the repo or plugin bundle, load only the relevant resource:

Estimator defaults, package calls, diagnostics, and citations: docs/methods-reference.md
Staggered DiD implementation: templates/stata/03_main_did.do, templates/r/03_main_did.R, or templates/python/main_did.py
Worked empirical examples: examples/aer-exemplars.md and examples/modern-aer-exemplars.md

Use the methods reference before prose: it fixes the estimand, diagnostic, inference method, and citation that the manuscript must report.

Identification Gate

Do not advance to robustness or writing until, for the chosen design, all are true:

[ ] A modern estimator is used — no TWFE on staggered data, no first-stage-F-only IV, no high-order-polynomial RDD
[ ] Every required diagnostic for the design (see the per-design lists above) is run and reported
[ ] Inference matches the design — cluster-robust / AR / wild bootstrap / permutation, not default OLS SEs by reflex
[ ] The identifying assumption is stated in one sentence, ready to drop into the introduction
[ ] No item in "Red Flags for Referees" is present

Gate Record Mini-Example

Write the gate decision before routing onward:

STRATEGY: IV
FIRST STAGE: effective F = 7.8; 2SLS CI is not primary
ROBUST INFERENCE: AR 95% CI = [-0.14, 0.52]
PLACEBO: beta = 0.003 (p = 0.71)
DECISION: advance with directional headline only

Handoff

STRATEGY: <DiD | IV | RDD | SCM | shift-share | RCT>
MODERN ESTIMATOR USED: <yes / no / which>
REQUIRED DIAGNOSTICS REPORTED: <list>
INFERENCE METHOD: <robust / cluster-robust / AR / wild bootstrap / permutation>
WEAK-IV / TWFE / POLY-ORDER RED FLAGS: <list or "none">
NEXT SKILL: aer-robustness

Anti-Patterns

Defending an old design ("the prior literature used TWFE") when modern estimators exist
Reporting OLS-with-controls as the main specification and IV/RD as "robustness"
Using more than one identification strategy as if they were independent confirmations when they share identifying variation
Footnoting the identifying assumption instead of stating it in the introduction

AER Identification

Overview

In AER-track empirical economics, identification is the paper. This skill routes among canonical designs, modern defaults, and referee-facing diagnostics.

If the design is fragile, return to aer-topic-selection; writing cannot save it.

When to Use

Designing the empirical strategy for a new project
The current strategy is TWFE / first-stage F / naive RDD and the referee will flag it
A prior submission was rejected on identification grounds and the design needs rebuilding
Choosing between two candidate identification strategies for the same question

Master Decision Tree

Is treatment assignment plausibly random conditional on observables?
├── Yes, by design (RCT, lottery) → run the RCT analysis; register PAP via AEA RCT Registry
└── No → identification must come from variation
    ├── Sharp threshold in a running variable → RDD (sharp or fuzzy)
    ├── Discrete policy change in some units, not others, over time → DiD
    │     ├── Single treatment date → canonical 2×2 DiD
    │     └── Staggered adoption → Callaway-Sant'Anna or Borusyak-Jaravel-Spiess
    ├── Endogenous regressor + plausibly exogenous shifter → IV
    │     ├── Shifter × pre-existing exposure shares → shift-share / Bartik
    │     └── Single instrument → weak-IV-robust inference if F < 50
    ├── One treated unit / aggregate intervention → synthetic control
    └── None of the above → reconsider the question

Difference-in-Differences

Canonical 2×2 (single treatment date, two groups)

Use TWFE if and only if:

Treatment timing is simultaneous for all treated units
The control group is never treated
Treatment-effect heterogeneity is implausible

Otherwise, TWFE produces biased and often sign-flipped estimates.

Staggered Adoption (most modern applications)

Do not use TWFE. Use one of:

Callaway and Sant'Anna (2021) — csdid (Stata), did (R). Identifies group-time average treatment effects (ATT(g,t)); estimands are doubly robust; supports event-study aggregation.
Borusyak, Jaravel, and Spiess (2024) — imputation estimator.
de Chaisemartin and D'Haultfœuille (2020) — did_multiplegt.
Sun and Abraham (2021) — interaction-weighted estimator for event studies.

Required diagnostics:

Goodman-Bacon decomposition to show the share of weight from "forbidden" comparisons under TWFE
Event-study plot with the imputation or Callaway-Sant'Anna estimator
Pre-trends test reported as the joint test, not just the visual
Heterogeneity by treatment cohort

Pre-Trends

A flat pre-trend is necessary but not sufficient. Report:

Visual event-study plot with 95% confidence intervals
Formal joint test of pre-period coefficients (p-value)
Honest DiD (Rambachan-Roth 2023) sensitivity bounds for the post-period

Instrumental Variables

Weak Instruments

The first-stage F > 10 rule is obsolete. Modern conventions:

Just-identified models: report Anderson-Rubin (AR) confidence sets as primary inference; AR keeps size under weak instruments.
For F < 50: 2SLS confidence intervals are unreliable; AR is required, not optional.
Stock-Yogo TSLS-bias critical values assume homoskedasticity and rarely fit clustered settings.

Use weakivtest (Stata), ivDiag (R), or the Olea-Pflueger effective F statistic.

Exclusion Restriction

The IV's credibility depends on a story, not a test. State the exclusion restriction in one sentence in the introduction and defend it with:

Institutional narrative (one paragraph)
A placebo regression where the instrument predicts an outcome it should not affect
Sensitivity analysis: how much exclusion-restriction violation would overturn the result (Conley et al. 2012)

Shift-Share / Bartik

Two valid sources of identification, with very different implications:

Exogenous shares (Goldsmith-Pinkham, Sorkin, Swift 2020) — argue that pre-existing exposure shares are conditionally exogenous; report the Rotemberg weights and inspect the top-5 industries driving identification.
Exogenous shocks (Borusyak, Hull, Jaravel 2022; Adão, Kolesár, Morales 2019) — argue that aggregate shocks are as-good-as-random; report shock-level inference.

Pick one explicitly. Do not hand-wave between the two.

Regression Discontinuity

Modern Defaults

Local linear regression with a triangular kernel. Polynomials of order > 1 are discouraged (Gelman-Imbens 2019).
MSE-optimal bandwidth (Calonico-Cattaneo-Titiunik 2014) with the robust bias-corrected confidence interval. Use rdrobust.
Donut RDD if bunching near the cutoff is a concern.
Covariate adjustment for efficiency; main result must hold without it.

Required Diagnostics

McCrary (2008) / Cattaneo-Jansson-Ma (2020) density test for manipulation of the running variable
Balance tests on predetermined covariates at the cutoff
Placebo cutoffs away from the true threshold
Bandwidth sensitivity — show the estimate across at least three bandwidths
Visual RD plot using rdplot with the binning method explicitly stated

Synthetic Control

When Appropriate

One (or few) treated units
Long pre-treatment outcome series (≥ 10 periods)
A large donor pool of plausibly comparable untreated units
Aggregate intervention (policy at the country, state, city level)

Modern Extensions

Generalized synthetic control (Xu 2017) for multiple treated units
Augmented synthetic control (Ben-Michael, Feller, Rothstein 2021) for bias correction
Synthetic DiD (Arkhangelsky et al. 2021) combining SCM and DiD weighting

Required Diagnostics

Placebo (in-time): apply SCM to pre-treatment fake intervention dates
Placebo (in-space): apply SCM to every donor as if it were treated; report the distribution of placebo effects
Permutation inference / Fisher exact p-value
Weight vector reported in the appendix; donors with > 10% weight discussed

Field Experiments and RCTs

If the paper uses a field experiment:

Register with AEA RCT Registry before the intervention begins. AEA journals require this prior to submission.
Pre-analysis plan (PAP) posted before unblinding. Per Olken and others, keep the PAP moderate in scope — pre-specify primary outcomes and the analysis specification, leave exploratory work clearly labeled as such.
Power calculations in the manuscript or appendix.
Multiple-hypothesis correction if more than one primary outcome.
Attrition documented and tested for differential attrition by treatment arm.

Mechanism vs. Identification

A common confusion: identification answers whether X causes Y; mechanism answers why. Mechanism evidence should not weaken the identification of the main effect. Run:

Subgroup heterogeneity (does the effect concentrate where theory predicts?)
Mediation analysis only if the mediator is itself plausibly exogenous (rare)
Auxiliary outcomes consistent with the proposed channel

Red Flags for Referees

TWFE on staggered data with no Goodman-Bacon decomposition
First-stage F = 12 cited as evidence of instrument strength
RDD with a polynomial of order 4
Synthetic control with no placebo inference
DiD with a "control group" of eventually-treated units
IV exclusion restriction defended only by "we control for X"
Quoting an Angrist-Pischke citation as a substitute for showing the diagnostic

StatsPAI Tool Bindings

Repository Resources

Bundled with the installed skill, no repository checkout needed --- read it before the repo resources below:

references/estimator-playbook.md --- per-design estimands, modern defaults, diagnostics, and referee-objection response scripts

When working from the repo or plugin bundle, load only the relevant resource:

Estimator defaults, package calls, diagnostics, and citations: docs/methods-reference.md
Staggered DiD implementation: templates/stata/03_main_did.do, templates/r/03_main_did.R, or templates/python/main_did.py
Worked empirical examples: examples/aer-exemplars.md and examples/modern-aer-exemplars.md

Use the methods reference before prose: it fixes the estimand, diagnostic, inference method, and citation that the manuscript must report.

Identification Gate

Do not advance to robustness or writing until, for the chosen design, all are true:

[ ] A modern estimator is used — no TWFE on staggered data, no first-stage-F-only IV, no high-order-polynomial RDD
[ ] Every required diagnostic for the design (see the per-design lists above) is run and reported
[ ] Inference matches the design — cluster-robust / AR / wild bootstrap / permutation, not default OLS SEs by reflex
[ ] The identifying assumption is stated in one sentence, ready to drop into the introduction
[ ] No item in "Red Flags for Referees" is present

Gate Record Mini-Example

Write the gate decision before routing onward:

STRATEGY: IV
FIRST STAGE: effective F = 7.8; 2SLS CI is not primary
ROBUST INFERENCE: AR 95% CI = [-0.14, 0.52]
PLACEBO: beta = 0.003 (p = 0.71)
DECISION: advance with directional headline only

Handoff

STRATEGY: <DiD | IV | RDD | SCM | shift-share | RCT>
MODERN ESTIMATOR USED: <yes / no / which>
REQUIRED DIAGNOSTICS REPORTED: <list>
INFERENCE METHOD: <robust / cluster-robust / AR / wild bootstrap / permutation>
WEAK-IV / TWFE / POLY-ORDER RED FLAGS: <list or "none">
NEXT SKILL: aer-robustness

Anti-Patterns

Defending an old design ("the prior literature used TWFE") when modern estimators exist
Reporting OLS-with-controls as the main specification and IV/RD as "robustness"
Using more than one identification strategy as if they were independent confirmations when they share identifying variation
Footnoting the identifying assumption instead of stating it in the introduction

Adoption

brycewang-stanford/aer-identification

$ install --global

Security Scan Results

SKILL.md

AER Identification

Overview

When to Use

Master Decision Tree

Difference-in-Differences

Canonical 2×2 (single treatment date, two groups)

Staggered Adoption (most modern applications)

Pre-Trends

Instrumental Variables

Weak Instruments

Exclusion Restriction

Shift-Share / Bartik

Regression Discontinuity

Modern Defaults

Required Diagnostics

Synthetic Control

When Appropriate

Modern Extensions

Required Diagnostics

Field Experiments and RCTs

Mechanism vs. Identification

Red Flags for Referees

StatsPAI Tool Bindings

Repository Resources

Identification Gate

Gate Record Mini-Example

Handoff

Anti-Patterns

Related Skills

brycewang-stanford/literature-review-tools

brycewang-stanford/auto-empirical-research-skills

brycewang-stanford/aer-preregistration

brycewang-stanford/economist-data-skill

brycewang-stanford/aer-identification

$ install --global

Security Scan Results

SKILL.md

AER Identification

Overview

When to Use

Master Decision Tree

Difference-in-Differences

Canonical 2×2 (single treatment date, two groups)

Staggered Adoption (most modern applications)

Pre-Trends

Instrumental Variables

Weak Instruments

Exclusion Restriction

Shift-Share / Bartik

Regression Discontinuity

Modern Defaults

Required Diagnostics

Synthetic Control

When Appropriate

Modern Extensions

Required Diagnostics

Field Experiments and RCTs

Mechanism vs. Identification

Red Flags for Referees

StatsPAI Tool Bindings

Repository Resources

Identification Gate

Gate Record Mini-Example

Handoff

Anti-Patterns

Related Skills

brycewang-stanford/literature-review-tools

brycewang-stanford/auto-empirical-research-skills

brycewang-stanford/aer-preregistration

brycewang-stanford/economist-data-skill