skills/11-James-Traina-compound-science/skills/causal-ml/SKILL.md
This skill covers causal machine learning methods in applied economics and quantitative social science. Use when implementing or choosing between modern ML-based causal estimators — including double machine learning, DML, partially linear models, interactive regression models, cross-fitting, Neyman orthogonality, debiased ML, causal forests, generalized random forest, GRF, honest causal trees, AIPW with machine learning, doubly robust with machine learning, DR-Learner, T-Learner, S-Learner, X-Learner, meta-learners, heterogeneous treatment effects, conditional average treatment effect, CATE, HTE, high-dimensional controls, LASSO controls, post-LASSO, post-double selection, Belloni-Chernozhukov-Hansen, Riesz representer, Chernozhukov, sample splitting, econml, DoubleML package, or any combination of machine learning and causal inference.
npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research causal-mlInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Reference for semiparametric ML estimators: DML with cross-fitting, generalized random forests, debiased regularization, and nuisance function approximation. Covers Neyman-orthogonal moment conditions, sample splitting, plug-in bias correction, and heterogeneous treatment effects.
Use when the user is:
econml, DoubleML, or grf packagesSkip when:
causal-inference skill)structural-modeling skill)identification-proofs skill)references/dml.mdreferences/grf-meta-learners.mdreferences/high-dim-cross-fitting.mdreferences/hte-inference.mdreferences/connections-traditional.md| Dimension | Traditional (IV, DiD, RDD) | Causal ML | |-----------|--------------------------|-----------| | Functional form | Parametric | Nonparametric / semi-parametric | | High-dimensional controls | Problematic | Native support | | Heterogeneous effects | Secondary (subgroup analysis) | Primary estimand (CATE) | | Sample requirements | Moderate N | ML nuisance needs large N | | Identification | Explicit (IV, DiD, RCT) | Same assumptions — ML is estimation, not identification |
Critical point: Causal ML does not relax identification assumptions. If you need a valid instrument, parallel trends, or no unmeasured confounding, those must still hold.
DML (Chernozhukov et al. 2018) fixes regularization bias in naive ML-in-regression. Partial out controls X from both Y and D using separate ML nuisance models, then regress residuals. Two properties: Neyman orthogonality (moment condition locally insensitive to nuisance error) and cross-fitting (prevents overfitting bias).
PLR (Partially Linear Regression): $Y = \theta D + g(X) + \varepsilon$. Workhorse for continuous or binary D with ATE under selection on observables. IRM (Interactive Regression Model): relaxes additive separability for binary D with heterogeneous effects.
Full implementation (Python/R code, cross-fitting from scratch, diagnostics) in references/dml.md.
Causal forests (Wager-Athey 2018; Athey-Tibshirani-Wager 2019) estimate CATE $\tau(x) = E[Y(1)-Y(0)|X=x]$ using honest forests (structure learned on one subsample, effects estimated on another). Use when CATE is the primary estimand and n $\geq$ 2,000. Always run the calibration test before reporting heterogeneity.
R (grf) and Python (econml) implementations, ATE/ATT extraction, BLP projections in references/grf-meta-learners.md.
Decompose CATE estimation into supervised learning sub-problems. DR-Learner (Kennedy 2023): best properties when both nuisance models are well-specified. T-Learner: simplest baseline. X-Learner: designed for imbalanced treatment. For applied work: DR-Learner primary, T-Learner benchmark. Large disagreement signals nuisance model problems.
All implementations in references/grf-meta-learners.md.
PDS-LASSO (Belloni-Chernozhukov-Hansen 2014): separate LASSOes of Y on X and D on X, union of selected variables, then OLS. Works at moderate n (~200 with sparse confounders). See references/high-dim-cross-fitting.md.
Before reporting CATE, test for genuine heterogeneity using BLP calibration test. Do not report heterogeneous effects if calibration test fails (p > 0.10). See references/hte-inference.md.
1. n < 500? → Use standard methods (causal-inference skill)
2. High-dim controls (p > 20), want ATE? → PDS-LASSO or DML-PLR; binary D → DML-IRM
3. CATE is primary estimand? → Causal Forest (large n) or DR-Learner (doubly robust)
4. Endogenous treatment with instrument? → DML-PLIV
5. Treatment is rare/imbalanced? → X-Learner
6. Quick benchmark? → Always compute T-Learner as baseline
| Method | Estimand | Python | R | Min n | Key diagnostic |
|--------|----------|--------|---|-------|----------------|
| DML-PLR | ATE | doubleml, econml | DoubleML | ~500 | Nuisance R², residual balance |
| DML-IRM | ATE (binary D) | doubleml, econml | DoubleML | ~500 | Propensity AUC, trim threshold |
| DML-PLIV | LATE | doubleml, econml | DoubleML | ~1,000 | Effective F-stat |
| Causal Forest | CATE(x) | econml | grf | ~2,000 | Calibration test, ATE match |
| DR-Learner | CATE(x) | econml.dr | manual/grf | ~1,000 | Propensity calibration |
| PDS-LASSO | ATE (high-dim X) | sklearn + manual | hdm | ~200 | Union size, penalty sensitivity |
| X-Learner | CATE (imbalanced D) | econml | manual | ~1,000 | Compare to DR-Learner |
Causal ML nests traditional estimators: DML with linear nuisance = OLS (Frisch-Waugh), DML + IV = PLIV, causal forests + instrument = heterogeneous LATE (grf::instrumental_forest), post-LASSO + many instruments = sparse instrument selection then 2SLS. Details in references/connections-traditional.md.
Agents: econometric-reviewer (post-estimation review, table/code consistency), identification-critic (IV/PLIV assumptions), numerical-auditor (convergence, seeding, Monte Carlo validation).
Cross-references: empirical-playbook skill → sensitivity-analysis.md (specification curve over ML choices), empirical-playbook skill → diagnostic-battery.md (nuisance R², overlap, calibration), numerical-auditor agent (synthetic data with known CATE).
Relationship to causal-inference skill: Use causal-inference to establish identification; use causal-ml for implementation with high-dimensional controls or when heterogeneity is primary. Complements, not substitutes.
references/dml.md — Full DML implementation: PLR, IRM, PLIV with econml/DoubleML, cross-fitting, diagnosticsreferences/grf-meta-learners.md — Causal forests (grf/econml), DR/T/S/X-Learner, calibration testsreferences/high-dim-cross-fitting.md — PDS-LASSO, Belloni-Chernozhukov-Hansen, cross-fitting protocolsreferences/hte-inference.md — Calibration tests, individual CATE CIs, BLP projections, subgroup analysisreferences/connections-traditional.md — DML-OLS equivalence, PLIV, instrumental forests, post-LASSOdevelopment
Conduct rigorous thematic analysis (TA) of qualitative data following Braun and Clarke's (2006) six-phase framework. Use whenever the user mentions 'thematic analysis', 'TA', 'Braun and Clarke', 'qualitative coding', 'identifying themes', or asks for help analysing interviews, focus groups, open-ended survey responses, or transcripts to identify patterns. Also trigger for questions about inductive vs theoretical coding, semantic vs latent themes, essentialist vs constructionist epistemology, building a thematic map, or writing up a qualitative findings section. Covers all six phases, the four upfront analytic decisions, the 15-point quality checklist, and the five common pitfalls. Produces a Word document write-up and an annotated thematic map. Does NOT cover IPA, grounded theory, discourse analysis, conversation analysis, or narrative analysis — use a different method for those.
development
Guide users through writing a systematic literature review (SLR) following the PRISMA 2020 framework. Use this skill whenever the user mentions 'systematic review', 'systematic literature review', 'SLR', 'PRISMA', 'PRISMA 2020', 'PRISMA flow diagram', 'PRISMA checklist', or asks for help writing, structuring, or auditing a literature review that follows reporting guidelines. Also trigger when the user asks about inclusion/exclusion criteria for a review, search strategies for databases like Scopus/WoS/PubMed, study selection processes, risk of bias assessment, or narrative synthesis for a review paper. This skill covers the full PRISMA 2020 checklist (27 items), produces a Word document manuscript in strict journal article format, generates an annotated PRISMA flow diagram, and enforces APA 7th Edition referencing throughout. It does NOT cover meta-analysis or statistical pooling. By Chuah Kee Man.
testing
Performs placebo-in-time sensitivity analysis with hierarchical null model and optional Bayesian assurance. Use when checking model robustness, verifying lack of pre-intervention effects, or estimating study power.
data-ai
Fit, summarize, plot, and interpret a chosen CausalPy experiment. Use after the causal method has been selected, including when configuring PyMC/sklearn models and scale-aware custom priors.