Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

brycewang-stanford/data-analysis

Name: data-analysis
Author: brycewang-stanford

skills/15-Felpix-Studios-social-science-research/skills/data-analysis/SKILL.md

npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research data-analysis

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Data Analysis Workflow

Run an end-to-end data analysis in R or Python: load, explore, analyze, and produce publication-ready output.

Input: $ARGUMENTS — a dataset path (e.g., data/county_panel.csv) or a description of the analysis goal (e.g., "regress wages on education with state fixed effects using CPS data").

Phase 0: Choose Language

Determine language from $ARGUMENTS or ask the user:

User mentions tidyverse, fixest, lm, .R context → R track
User mentions pandas, statsmodels, sklearn, .py or .ipynb context → Python track
Dataset is .csv/.parquet with no language cue → use AskUserQuestion with a single-select menu:
- header: "Language"
- question: "Which language should I use for this analysis?"
- options:
  - label: "R (Recommended)", description: "tidyverse, fixest, ggplot2 — full plugin support with coding conventions and R reviewer"
  - label: "Python", description: "pandas, statsmodels — supported for analysis scripts and figures"
  - label: "Both", description: "R for figures and tables, Python for data processing"

R Track

Constraints

Follow rules/r-code-conventions.md for all standards
Save scripts to scripts/R/ with descriptive names
Save all outputs (figures, tables, RDS) to output/
Use saveRDS() for every computed object
Run r-reviewer on the generated script before presenting results

Phase 1: Setup and Data Loading

Create R script with proper header (title, author, purpose, inputs, outputs)
Load required packages at top (library(), never require())
Set seed once at top: set.seed(42)
Create output directories: dir.create("output/analysis", recursive = TRUE, showWarnings = FALSE)
Load and inspect the dataset

Phase 2: Exploratory Data Analysis

summary(), missingness rates, variable types
Histograms for key continuous variables
Scatter plots, correlation matrices
Panel trends, pre-treatment comparisons if applicable
Save all diagnostic figures to output/diagnostics/

Phase 3: Main Analysis

Panel data: use fixest; cross-section: use lm/glm
Cluster SEs at the appropriate level (document why)
Multiple specifications: start simple, progressively add controls
Report standardized effects alongside raw coefficients

Phase 4: Publication-Ready Output

Tables: modelsummary (preferred) or stargazer — export .tex and .html Figures: ggplot2 with project theme; explicit ggsave(width = X, height = Y); save as .pdf and .png; add bg = "transparent" only if output is for Beamer slides

Phase 5: Save and Review

saveRDS() for all key objects
Run the r-reviewer agent: "Review the script at scripts/R/[script_name].R"
Address Critical and High issues before presenting results

R Script Template

# ============================================================
# [Descriptive Title]
# Author: [from project context]
# Purpose: [What this script does]
# Inputs:  [Data files]
# Outputs: [Figures, tables, RDS files]
# ============================================================

# 0. Setup ----
library(tidyverse)
library(fixest)
library(modelsummary)

set.seed(42)
dir.create("output/analysis", recursive = TRUE, showWarnings = FALSE)

# 1. Data Loading ----
# 2. Exploratory Analysis ----
# 3. Main Analysis ----
# 4. Tables and Figures ----
# 5. Export ----

Python Track

Constraints

Save scripts to scripts/python/ with descriptive names
Save all outputs (figures, tables, pickles) to output/
Use joblib.dump() for model objects; .to_parquet() for DataFrames
Use pathlib.Path for all file paths — never hardcode absolute paths
Set random seeds at the top of the script

Phase 1: Setup and Data Loading

Create Python script with header (title, author, purpose, inputs, outputs)
Import all packages at the top of the file
Set seeds: np.random.seed(42) and random.seed(42)
Create output directories: Path("output/analysis").mkdir(parents=True, exist_ok=True)
Load and inspect the dataset with pandas

Phase 2: Exploratory Data Analysis

df.describe(), df.isnull().sum(), df.dtypes
Histograms and distributions with matplotlib/seaborn
Scatter plots and correlation matrices
Save diagnostic figures to output/diagnostics/
Save summary stats: df.describe().to_csv("output/diagnostics/summary_stats.csv")

Phase 3: Main Analysis

Cross-section OLS: smf.ols("y ~ x", data=df).fit(cov_type="HC3")
Panel data: PanelOLS from linearmodels with cluster-robust SEs
Multiple specifications: build incrementally
Document SE choice with a comment

Phase 4: Publication-Ready Output

Tables: Format with pandas and export via .to_latex() or stargazer (Python port) Figures: matplotlib/seaborn; explicit fig.savefig(path, dpi=300, bbox_inches="tight"); save as .pdf and .png

Phase 5: Save and Review

joblib.dump(model, "output/model.pkl") for fitted models
df_results.to_parquet("output/results.parquet") for DataFrames
Review the script manually against the Python checklist below before presenting

Python Script Template

# ============================================================
# [Descriptive Title]
# Author: [from project context]
# Purpose: [What this script does]
# Inputs:  [Data files]
# Outputs: [Figures, tables, pickle/parquet files]
# ============================================================

import random
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
import seaborn as sns
import joblib
from pathlib import Path

# Seeds
np.random.seed(42)
random.seed(42)

# Output directories
Path("output/analysis").mkdir(parents=True, exist_ok=True)
Path("output/figures").mkdir(parents=True, exist_ok=True)

# 1. Data Loading
# 2. Exploratory Analysis
# 3. Main Analysis
# 4. Tables and Figures
# 5. Export

Python Quality Checklist

[ ] All imports at top
[ ] Random seeds set (numpy + stdlib)
[ ] All paths use pathlib.Path — no hardcoded strings
[ ] Output directories created with mkdir(exist_ok=True)
[ ] Figures saved with explicit dpi=300, bbox_inches="tight"
[ ] Model objects saved with joblib.dump()
[ ] DataFrames saved as parquet
[ ] Comments explain WHY, not WHAT

Shared Principles

Reproduce, don't guess. If the user specifies a regression, run exactly that.
Show your work. Compute summary statistics before jumping to regression.
Check for issues. Look for multicollinearity, outliers, perfect prediction, missing data.
Use relative paths. All paths relative to repository root.
No hardcoded values. Use variables for sample restrictions, date ranges, thresholds.

brycewang-stanford/data-analysis

skills/15-Felpix-Studios-social-science-research/skills/data-analysis/SKILL.md

End-to-end data analysis workflow in R or Python — from exploration through regression to publication-ready tables and figures. Make sure to use this skill whenever the user wants to run any empirical analysis, write analysis code, or produce output from data. Triggers include: "analyze this data", "run a regression", "write R code for this", "write Python code for this", "I have a dataset", "help me with this regression", "run a DiD", "run an RDD", "event study", "IV regression", "fit a model", "produce a table", "make a figure", "explore my data", or any request involving a dataset path or empirical estimation.

1,685 stars

development

Updated Jun 5, 2026

$ install --global

skillsauth

npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research data-analysis

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 5, 2026, 4:31 AM140.4s1 file scanned

SKILL.md

name:: data-analysis
description:: >-
End-to-end data analysis workflow in R or Python — from exploration through regression to publication-ready tables and figures. Make sure to use this skill whenever the user wants to run any empirical analysis, write analysis code, or produce output from data. Triggers include:: analyze this data", "run a regression", "write R code for this", "write Python code for this", "I have a dataset", "help me with this regression", "run a DiD", "run an RDD", "event study", "IV regression", "fit a model", "produce a table", "make a figure", "explore my data", or any request involving a dataset path or empirical estimation.
argument-hint:: [dataset path or description of analysis goal]
allowed-tools:: ["Read", "Grep", "Glob", "Write", "Edit", "Bash", "Task", "AskUserQuestion"]

Data Analysis Workflow

Run an end-to-end data analysis in R or Python: load, explore, analyze, and produce publication-ready output.

Input: $ARGUMENTS — a dataset path (e.g., data/county_panel.csv) or a description of the analysis goal (e.g., "regress wages on education with state fixed effects using CPS data").

Phase 0: Choose Language

Determine language from $ARGUMENTS or ask the user:

User mentions tidyverse, fixest, lm, .R context → R track
User mentions pandas, statsmodels, sklearn, .py or .ipynb context → Python track
Dataset is .csv/.parquet with no language cue → use AskUserQuestion with a single-select menu:
- header: "Language"
- question: "Which language should I use for this analysis?"
- options:
  - label: "R (Recommended)", description: "tidyverse, fixest, ggplot2 — full plugin support with coding conventions and R reviewer"
  - label: "Python", description: "pandas, statsmodels — supported for analysis scripts and figures"
  - label: "Both", description: "R for figures and tables, Python for data processing"

R Track

Constraints

Follow rules/r-code-conventions.md for all standards
Save scripts to scripts/R/ with descriptive names
Save all outputs (figures, tables, RDS) to output/
Use saveRDS() for every computed object
Run r-reviewer on the generated script before presenting results

Phase 1: Setup and Data Loading

Create R script with proper header (title, author, purpose, inputs, outputs)
Load required packages at top (library(), never require())
Set seed once at top: set.seed(42)
Create output directories: dir.create("output/analysis", recursive = TRUE, showWarnings = FALSE)
Load and inspect the dataset

Phase 2: Exploratory Data Analysis

summary(), missingness rates, variable types
Histograms for key continuous variables
Scatter plots, correlation matrices
Panel trends, pre-treatment comparisons if applicable
Save all diagnostic figures to output/diagnostics/

Phase 3: Main Analysis

Panel data: use fixest; cross-section: use lm/glm
Cluster SEs at the appropriate level (document why)
Multiple specifications: start simple, progressively add controls
Report standardized effects alongside raw coefficients

Phase 4: Publication-Ready Output

Phase 5: Save and Review

saveRDS() for all key objects
Run the r-reviewer agent: "Review the script at scripts/R/[script_name].R"
Address Critical and High issues before presenting results

R Script Template

# ============================================================
# [Descriptive Title]
# Author: [from project context]
# Purpose: [What this script does]
# Inputs:  [Data files]
# Outputs: [Figures, tables, RDS files]
# ============================================================

# 0. Setup ----
library(tidyverse)
library(fixest)
library(modelsummary)

set.seed(42)
dir.create("output/analysis", recursive = TRUE, showWarnings = FALSE)

# 1. Data Loading ----
# 2. Exploratory Analysis ----
# 3. Main Analysis ----
# 4. Tables and Figures ----
# 5. Export ----

Python Track

Constraints

Save scripts to scripts/python/ with descriptive names
Save all outputs (figures, tables, pickles) to output/
Use joblib.dump() for model objects; .to_parquet() for DataFrames
Use pathlib.Path for all file paths — never hardcode absolute paths
Set random seeds at the top of the script

Phase 1: Setup and Data Loading

Create Python script with header (title, author, purpose, inputs, outputs)
Import all packages at the top of the file
Set seeds: np.random.seed(42) and random.seed(42)
Create output directories: Path("output/analysis").mkdir(parents=True, exist_ok=True)
Load and inspect the dataset with pandas

Phase 2: Exploratory Data Analysis

df.describe(), df.isnull().sum(), df.dtypes
Histograms and distributions with matplotlib/seaborn
Scatter plots and correlation matrices
Save diagnostic figures to output/diagnostics/
Save summary stats: df.describe().to_csv("output/diagnostics/summary_stats.csv")

Phase 3: Main Analysis

Cross-section OLS: smf.ols("y ~ x", data=df).fit(cov_type="HC3")
Panel data: PanelOLS from linearmodels with cluster-robust SEs
Multiple specifications: build incrementally
Document SE choice with a comment

Phase 4: Publication-Ready Output

Phase 5: Save and Review

joblib.dump(model, "output/model.pkl") for fitted models
df_results.to_parquet("output/results.parquet") for DataFrames
Review the script manually against the Python checklist below before presenting

Python Script Template

# ============================================================
# [Descriptive Title]
# Author: [from project context]
# Purpose: [What this script does]
# Inputs:  [Data files]
# Outputs: [Figures, tables, pickle/parquet files]
# ============================================================

import random
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
import seaborn as sns
import joblib
from pathlib import Path

# Seeds
np.random.seed(42)
random.seed(42)

# Output directories
Path("output/analysis").mkdir(parents=True, exist_ok=True)
Path("output/figures").mkdir(parents=True, exist_ok=True)

# 1. Data Loading
# 2. Exploratory Analysis
# 3. Main Analysis
# 4. Tables and Figures
# 5. Export

Python Quality Checklist

[ ] All imports at top
[ ] Random seeds set (numpy + stdlib)
[ ] All paths use pathlib.Path — no hardcoded strings
[ ] Output directories created with mkdir(exist_ok=True)
[ ] Figures saved with explicit dpi=300, bbox_inches="tight"
[ ] Model objects saved with joblib.dump()
[ ] DataFrames saved as parquet
[ ] Comments explain WHY, not WHAT

Shared Principles

Reproduce, don't guess. If the user specifies a regression, run exactly that.
Show your work. Compute summary statistics before jumping to regression.
Check for issues. Look for multicollinearity, outliers, perfect prediction, missing data.
Use relative paths. All paths relative to repository root.
No hardcoded values. Use variables for sample restrictions, date ranges, thresholds.

Related Skills

brycewang-stanford/literature-review-tools

tools

VerifiedTrustedCommunity

Recommend AND run open-source AI tools, agents, Claude Code / Codex skills, and MCP servers for any stage of a literature review — searching, reading, extracting, synthesizing, screening, citation-checking, and paper writing. Use when the user asks "what tool should I use to..." OR "install/run/use <tool> to ..." for research/lit-review work: automating a survey or related-work section, PDF→Markdown extraction for LLMs (MinerU/marker/docling), PRISMA / systematic review (ASReview), citation-backed Q&A over PDFs (PaperQA2), wiring papers into Claude/Cursor via MCP (arxiv/paper-search/zotero servers), or chatting with a Zotero library. Ships a launcher (scripts/litrun.py) that installs each tool in an isolated venv and runs it. Curated catalog of 70+ vetted projects. 支持中英文（用于「文献综述工具选型」与「一键安装/运行」）。

3,109SKILL.mdUpdated Jul 28, 2026

brycewang-stanford/literature-review-tools

brycewang-stanford/auto-empirical-research-skills

development

VerifiedTrustedCommunity

Route empirical-research requests through the Auto-Empirical Research Skills catalog when this whole repository is installed as one skill in Codex, CodeBuddy, Claude Code, or another IDE. Use to choose and load the right vendored AERS skill for causal inference, econometrics, replication, data acquisition, manuscript writing, peer review and referee responses, citation checking, de-AIGC editing, or full empirical-paper workflows without reading the entire repository at once.

3,109SKILL.mdUpdated Jun 27, 2026

brycewang-stanford/auto-empirical-research-skills

brycewang-stanford/aer-preregistration

documentation

VerifiedTrustedCommunity

Use when the project collects primary data or runs a field, lab, or survey experiment, before the intervention begins — write the pre-analysis plan, size the sample from a power calculation, and register with the AEA RCT Registry. Apply after the design is chosen in aer-identification and before any outcome data are seen.

3,021SKILL.mdUpdated Jul 23, 2026

brycewang-stanford/aer-preregistration

brycewang-stanford/economist-data-skill

tools

VerifiedTrustedCommunity

Guide economists to authoritative data sources with explicit, confirmed data specifications before retrieval; interfaces with Playwright MCP to navigate portals and extract real data, not articles about data.

3,021SKILL.mdUpdated Jul 23, 2026

brycewang-stanford/economist-data-skill

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research.git

# Copy into Claude Code skills folder (global)
cp -r Awesome-Agent-Skills-for-Empirical-Research/skills/15-Felpix-Studios-social-science-research/skills/data-analysis ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research

1,685 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT