Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

brycewang-stanford/causal-ml

Name: causal-ml
Author: brycewang-stanford

skills/11-James-Traina-compound-science/skills/causal-ml/SKILL.md

npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research causal-ml

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Causal Machine Learning

Reference for semiparametric ML estimators: DML with cross-fitting, generalized random forests, debiased regularization, and nuisance function approximation. Covers Neyman-orthogonal moment conditions, sample splitting, plug-in bias correction, and heterogeneous treatment effects.

When to Use This Skill

Use when the user is:

Estimating treatment effects with high-dimensional controls (p large relative to n)
Interested in heterogeneous treatment effects (CATE) as a primary estimand
Applying ML for flexible nuisance function estimation within a causal framework
Implementing cross-fitting, sample splitting, or Neyman-orthogonal estimators
Using econml, DoubleML, or grf packages

Skip when:

Sample is small (n < 500 — ML nuisance models need data)
A well-specified parametric model is available and defensible
The task is standard IV/DiD/RDD without high-dimensional controls (use causal-inference skill)
Structural modeling is needed (use structural-modeling skill)
The task needs formal identification proof (use identification-proofs skill)

Where to Start

Choosing a method? Jump to Method Selection Guide
ATE with many controls? See references/dml.md
Heterogeneous treatment effects? See references/grf-meta-learners.md
Variable selection for controls? See references/high-dim-cross-fitting.md
Reporting HTE results? See references/hte-inference.md
Connecting to traditional methods? See references/connections-traditional.md

Causal ML vs Traditional Methods

| Dimension | Traditional (IV, DiD, RDD) | Causal ML | |-----------|--------------------------|-----------| | Functional form | Parametric | Nonparametric / semi-parametric | | High-dimensional controls | Problematic | Native support | | Heterogeneous effects | Secondary (subgroup analysis) | Primary estimand (CATE) | | Sample requirements | Moderate N | ML nuisance needs large N | | Identification | Explicit (IV, DiD, RCT) | Same assumptions — ML is estimation, not identification |

Critical point: Causal ML does not relax identification assumptions. If you need a valid instrument, parallel trends, or no unmeasured confounding, those must still hold.

Double Machine Learning (DML)

DML (Chernozhukov et al. 2018) fixes regularization bias in naive ML-in-regression. Partial out controls X from both Y and D using separate ML nuisance models, then regress residuals. Two properties: Neyman orthogonality (moment condition locally insensitive to nuisance error) and cross-fitting (prevents overfitting bias).

PLR (Partially Linear Regression): $Y = \theta D + g(X) + \varepsilon$. Workhorse for continuous or binary D with ATE under selection on observables. IRM (Interactive Regression Model): relaxes additive separability for binary D with heterogeneous effects.

Full implementation (Python/R code, cross-fitting from scratch, diagnostics) in references/dml.md.

Causal Forests

Causal forests (Wager-Athey 2018; Athey-Tibshirani-Wager 2019) estimate CATE $\tau(x) = E[Y(1)-Y(0)|X=x]$ using honest forests (structure learned on one subsample, effects estimated on another). Use when CATE is the primary estimand and n $\geq$ 2,000. Always run the calibration test before reporting heterogeneity.

R (grf) and Python (econml) implementations, ATE/ATT extraction, BLP projections in references/grf-meta-learners.md.

Meta-Learners

Decompose CATE estimation into supervised learning sub-problems. DR-Learner (Kennedy 2023): best properties when both nuisance models are well-specified. T-Learner: simplest baseline. X-Learner: designed for imbalanced treatment. For applied work: DR-Learner primary, T-Learner benchmark. Large disagreement signals nuisance model problems.

All implementations in references/grf-meta-learners.md.

High-Dimensional Controls

PDS-LASSO (Belloni-Chernozhukov-Hansen 2014): separate LASSOes of Y on X and D on X, union of selected variables, then OLS. Works at moderate n (~200 with sparse confounders). See references/high-dim-cross-fitting.md.

HTE Inference

Before reporting CATE, test for genuine heterogeneity using BLP calibration test. Do not report heterogeneous effects if calibration test fails (p > 0.10). See references/hte-inference.md.

Method Selection Guide

Decision Heuristic

1. n < 500? → Use standard methods (causal-inference skill)
2. High-dim controls (p > 20), want ATE? → PDS-LASSO or DML-PLR; binary D → DML-IRM
3. CATE is primary estimand? → Causal Forest (large n) or DR-Learner (doubly robust)
4. Endogenous treatment with instrument? → DML-PLIV
5. Treatment is rare/imbalanced? → X-Learner
6. Quick benchmark? → Always compute T-Learner as baseline

Full Method Comparison

| Method | Estimand | Python | R | Min n | Key diagnostic | |--------|----------|--------|---|-------|----------------| | DML-PLR | ATE | doubleml, econml | DoubleML | ~500 | Nuisance R², residual balance | | DML-IRM | ATE (binary D) | doubleml, econml | DoubleML | ~500 | Propensity AUC, trim threshold | | DML-PLIV | LATE | doubleml, econml | DoubleML | ~1,000 | Effective F-stat | | Causal Forest | CATE(x) | econml | grf | ~2,000 | Calibration test, ATE match | | DR-Learner | CATE(x) | econml.dr | manual/grf | ~1,000 | Propensity calibration | | PDS-LASSO | ATE (high-dim X) | sklearn + manual | hdm | ~200 | Union size, penalty sensitivity | | X-Learner | CATE (imbalanced D) | econml | manual | ~1,000 | Compare to DR-Learner |

Limitations to State Explicitly

ML needs data: Causal forests need n $\geq$ 2,000; DML needs n $\geq$ 500. Below these, use parametric methods.
Identification is not relaxed: ML is better nuisance estimation, not weaker assumptions.
CATE inference is hard: Individual-level CIs are conservative; policy targeting requires care.
Publication: DML and causal forests are mainstream in top applied micro journals. Compare to traditional estimators.

Connections to Traditional Methods

Causal ML nests traditional estimators: DML with linear nuisance = OLS (Frisch-Waugh), DML + IV = PLIV, causal forests + instrument = heterogeneous LATE (grf::instrumental_forest), post-LASSO + many instruments = sparse instrument selection then 2SLS. Details in references/connections-traditional.md.

Integration with Plugin

Agents: econometric-reviewer (post-estimation review, table/code consistency), identification-critic (IV/PLIV assumptions), numerical-auditor (convergence, seeding, Monte Carlo validation).

Cross-references: empirical-playbook skill → sensitivity-analysis.md (specification curve over ML choices), empirical-playbook skill → diagnostic-battery.md (nuisance R², overlap, calibration), numerical-auditor agent (synthetic data with known CATE).

Relationship to causal-inference skill: Use causal-inference to establish identification; use causal-ml for implementation with high-dimensional controls or when heterogeneity is primary. Complements, not substitutes.

Reference Files

references/dml.md — Full DML implementation: PLR, IRM, PLIV with econml/DoubleML, cross-fitting, diagnostics
references/grf-meta-learners.md — Causal forests (grf/econml), DR/T/S/X-Learner, calibration tests
references/high-dim-cross-fitting.md — PDS-LASSO, Belloni-Chernozhukov-Hansen, cross-fitting protocols
references/hte-inference.md — Calibration tests, individual CATE CIs, BLP projections, subgroup analysis
references/connections-traditional.md — DML-OLS equivalence, PLIV, instrumental forests, post-LASSO

brycewang-stanford/causal-ml

skills/11-James-Traina-compound-science/skills/causal-ml/SKILL.md

This skill covers causal machine learning methods in applied economics and quantitative social science. Use when implementing or choosing between modern ML-based causal estimators — including double machine learning, DML, partially linear models, interactive regression models, cross-fitting, Neyman orthogonality, debiased ML, causal forests, generalized random forest, GRF, honest causal trees, AIPW with machine learning, doubly robust with machine learning, DR-Learner, T-Learner, S-Learner, X-Learner, meta-learners, heterogeneous treatment effects, conditional average treatment effect, CATE, HTE, high-dimensional controls, LASSO controls, post-LASSO, post-double selection, Belloni-Chernozhukov-Hansen, Riesz representer, Chernozhukov, sample splitting, econml, DoubleML package, or any combination of machine learning and causal inference.

54 stars

development

Updated Apr 15, 2026

$ install --global

skillsauth

npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research causal-ml

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 15, 2026, 10:46 PM51.8s6 files scanned

SKILL.md

name:: causal-ml
argument-hint:: <estimator or method choice>
description:: >-

Causal Machine Learning

When to Use This Skill

Use when the user is:

Estimating treatment effects with high-dimensional controls (p large relative to n)
Interested in heterogeneous treatment effects (CATE) as a primary estimand
Applying ML for flexible nuisance function estimation within a causal framework
Implementing cross-fitting, sample splitting, or Neyman-orthogonal estimators
Using econml, DoubleML, or grf packages

Skip when:

Sample is small (n < 500 — ML nuisance models need data)
A well-specified parametric model is available and defensible
The task is standard IV/DiD/RDD without high-dimensional controls (use causal-inference skill)
Structural modeling is needed (use structural-modeling skill)
The task needs formal identification proof (use identification-proofs skill)

Where to Start

Choosing a method? Jump to Method Selection Guide
ATE with many controls? See references/dml.md
Heterogeneous treatment effects? See references/grf-meta-learners.md
Variable selection for controls? See references/high-dim-cross-fitting.md
Reporting HTE results? See references/hte-inference.md
Connecting to traditional methods? See references/connections-traditional.md

Causal ML vs Traditional Methods

Critical point: Causal ML does not relax identification assumptions. If you need a valid instrument, parallel trends, or no unmeasured confounding, those must still hold.

Double Machine Learning (DML)

Full implementation (Python/R code, cross-fitting from scratch, diagnostics) in references/dml.md.

Causal Forests

R (grf) and Python (econml) implementations, ATE/ATT extraction, BLP projections in references/grf-meta-learners.md.

Meta-Learners

All implementations in references/grf-meta-learners.md.

High-Dimensional Controls

HTE Inference

Before reporting CATE, test for genuine heterogeneity using BLP calibration test. Do not report heterogeneous effects if calibration test fails (p > 0.10). See references/hte-inference.md.

Method Selection Guide

Decision Heuristic

1. n < 500? → Use standard methods (causal-inference skill)
2. High-dim controls (p > 20), want ATE? → PDS-LASSO or DML-PLR; binary D → DML-IRM
3. CATE is primary estimand? → Causal Forest (large n) or DR-Learner (doubly robust)
4. Endogenous treatment with instrument? → DML-PLIV
5. Treatment is rare/imbalanced? → X-Learner
6. Quick benchmark? → Always compute T-Learner as baseline

Full Method Comparison

Limitations to State Explicitly

ML needs data: Causal forests need n $\geq$ 2,000; DML needs n $\geq$ 500. Below these, use parametric methods.
Identification is not relaxed: ML is better nuisance estimation, not weaker assumptions.
CATE inference is hard: Individual-level CIs are conservative; policy targeting requires care.
Publication: DML and causal forests are mainstream in top applied micro journals. Compare to traditional estimators.

Connections to Traditional Methods

Integration with Plugin

Agents: econometric-reviewer (post-estimation review, table/code consistency), identification-critic (IV/PLIV assumptions), numerical-auditor (convergence, seeding, Monte Carlo validation).

Reference Files

references/dml.md — Full DML implementation: PLR, IRM, PLIV with econml/DoubleML, cross-fitting, diagnostics
references/grf-meta-learners.md — Causal forests (grf/econml), DR/T/S/X-Learner, calibration tests
references/high-dim-cross-fitting.md — PDS-LASSO, Belloni-Chernozhukov-Hansen, cross-fitting protocols
references/hte-inference.md — Calibration tests, individual CATE CIs, BLP projections, subgroup analysis
references/connections-traditional.md — DML-OLS equivalence, PLIV, instrumental forests, post-LASSO

Related Skills

brycewang-stanford/literature-review-tools

tools

VerifiedTrustedCommunity

Recommend AND run open-source AI tools, agents, Claude Code / Codex skills, and MCP servers for any stage of a literature review — searching, reading, extracting, synthesizing, screening, citation-checking, and paper writing. Use when the user asks "what tool should I use to..." OR "install/run/use <tool> to ..." for research/lit-review work: automating a survey or related-work section, PDF→Markdown extraction for LLMs (MinerU/marker/docling), PRISMA / systematic review (ASReview), citation-backed Q&A over PDFs (PaperQA2), wiring papers into Claude/Cursor via MCP (arxiv/paper-search/zotero servers), or chatting with a Zotero library. Ships a launcher (scripts/litrun.py) that installs each tool in an isolated venv and runs it. Curated catalog of 70+ vetted projects. 支持中英文（用于「文献综述工具选型」与「一键安装/运行」）。

3,109SKILL.mdUpdated Jul 28, 2026

brycewang-stanford/literature-review-tools

brycewang-stanford/auto-empirical-research-skills

development

VerifiedTrustedCommunity

Route empirical-research requests through the Auto-Empirical Research Skills catalog when this whole repository is installed as one skill in Codex, CodeBuddy, Claude Code, or another IDE. Use to choose and load the right vendored AERS skill for causal inference, econometrics, replication, data acquisition, manuscript writing, peer review and referee responses, citation checking, de-AIGC editing, or full empirical-paper workflows without reading the entire repository at once.

3,109SKILL.mdUpdated Jun 27, 2026

brycewang-stanford/auto-empirical-research-skills

brycewang-stanford/aer-preregistration

documentation

VerifiedTrustedCommunity

Use when the project collects primary data or runs a field, lab, or survey experiment, before the intervention begins — write the pre-analysis plan, size the sample from a power calculation, and register with the AEA RCT Registry. Apply after the design is chosen in aer-identification and before any outcome data are seen.

3,021SKILL.mdUpdated Jul 23, 2026

brycewang-stanford/aer-preregistration

brycewang-stanford/economist-data-skill

tools

VerifiedTrustedCommunity

Guide economists to authoritative data sources with explicit, confirmed data specifications before retrieval; interfaces with Playwright MCP to navigate portals and extract real data, not articles about data.

3,021SKILL.mdUpdated Jul 23, 2026

brycewang-stanford/economist-data-skill

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research.git

# Copy into Claude Code skills folder (global)
cp -r Awesome-Agent-Skills-for-Empirical-Research/skills/11-James-Traina-compound-science/skills/causal-ml ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research

54 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT