Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

brycewang-stanford/data-deposit

Name: data-deposit
Author: brycewang-stanford

skills/41-sticerd-eee-sewage-econometrics-check/skills/data-deposit/SKILL.md

npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research data-deposit

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Data Deposit Preparation

Prepare an AEA Data Editor compliant replication package for the sewage-house-prices project.

Input: $ARGUMENTS — output directory (defaults to Replication/).

Project-Specific Context

Pipeline Structure

The project has a 6-layer data pipeline in scripts/R/:

01_data_ingestion/ — Raw data collection (EDM archives, APIs)
02_data_cleaning/ — Format standardisation, geocoding, validation
03_data_enrichment/ — Temporal aggregation, rainfall metrics, dry spill identification
04_feature_engineering/ — Spatial matching (house/rental ↔ spill sites)
05_data_integration/ — Merging historical and API EDM data
06_analysis_datasets/ — Final dataset assembly

Analysis scripts: scripts/R/09_analysis/ (6 subdirectories by approach) Utilities: scripts/R/utils/ Python scripts: scripts/python/ (river network processing) Docker pipelines: RiverNetworks/, upstream_downstream/

Data Layout

data/raw/          — Original immutable data (EDM, Land Registry, Met Office, shapefiles)
data/processed/    — Intermediate pipeline outputs (parquet)
data/final/        — Analysis-ready datasets
data/cache/        — Postcode geocoding cache

Key Dependencies

R packages managed via renv (renv.lock)
Python environment via uv in scripts/python/
PostGIS via Docker for river network analysis

Workflow

Step 1: Inventory

Read all scripts in scripts/R/ and parse data file references
Read renv.lock for package versions
Scan output/tables/ and output/figures/ for output files
Read the manuscript (docs/overleaf/_main.tex) for table/figure references
Check scripts/python/ for Python dependencies

Step 2: Analyse Dependencies

Parse script dependencies (which scripts create files that others load)
Map the execution order (follows the 6-layer pipeline, then analysis scripts)
Cross-reference the full execution order documented in ReadMe.md

Step 3: Assemble Package

Create in Replication/ (or specified directory):

README.md — AEA format:
- Data availability statement (which data is public vs restricted)
- Computational requirements (R version, packages, PostGIS, Python)
- Program descriptions (what each script does)
- Replication instructions (step-by-step)
- Expected runtime

master.R — Runs everything in order:

# Master replication script for "Sewage in Our Waters"
# Estimated runtime: [X hours]

source(here::here("scripts", "R", "01_data_ingestion", "script.R"))
# ... through all layers
source(here::here("scripts", "R", "09_analysis", "subdir", "script.R"))

install_packages.R — If renv is not used:

install.packages(c("tidyverse", "fixest", "modelsummary", ...))

DEPOSIT_CHECKLIST.md — Pre-deposit verification

Step 4: Validate

Run the 10 verification checks (equivalent to /audit-replication):

Script execution order is correct
All data file references resolve
All output files are generated
Package versions documented
No hardcoded absolute paths
Data provenance documented
README completeness (AEA format)
Output cross-reference (every table/figure traced to a script)
Restricted data properly flagged
Master script runs without modification

Step 5: Present Results

Package contents — All files in Replication/
Script order — Numbered sequence with dependency graph
Data availability — Public vs restricted datasets
Verification result — X/10 checks passed
Deposit steps — openICPSR / Zenodo instructions

Principles

AEA Data Editor standards are the target. README format, versions, data access statements.
Don't rename scripts without approval. Present ordering first, let the user decide.
Thorough data provenance. Every dataset documented with source, access date, and restrictions.
Test before declaring ready. Always validate after assembly.
Document restricted data clearly. Land Registry and Zoopla data may have access restrictions.

brycewang-stanford/data-deposit

skills/41-sticerd-eee-sewage-econometrics-check/skills/data-deposit/SKILL.md

Prepare a replication package for the sewage-house-prices project. Generates AEA-compliant README, master script, numbered script order, install script, and deposit checklist. Validates the package against 10 verification checks. This skill should be used when asked to "prepare replication", "data deposit", "create replication package", or "package for submission".

1,065 stars

testing

Updated May 20, 2026

$ install --global

skillsauth

npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research data-deposit

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: May 20, 2026, 4:58 AM133.2s1 file scanned

SKILL.md

name:: data-deposit
description:: Prepare a replication package for the sewage-house-prices project. Generates AEA-compliant README, master script, numbered script order, install script, and deposit checklist. Validates the package against 10 verification checks. This skill should be used when asked to "prepare replication", "data deposit", "create replication package", or "package for submission".
argument-hint:: [optional: output directory]
allowed-tools:: ["Read", "Grep", "Glob", "Write", "Edit", "Bash", "Agent"]

Data Deposit Preparation

Prepare an AEA Data Editor compliant replication package for the sewage-house-prices project.

Input: $ARGUMENTS — output directory (defaults to Replication/).

Project-Specific Context

Pipeline Structure

The project has a 6-layer data pipeline in scripts/R/:

01_data_ingestion/ — Raw data collection (EDM archives, APIs)
02_data_cleaning/ — Format standardisation, geocoding, validation
03_data_enrichment/ — Temporal aggregation, rainfall metrics, dry spill identification
04_feature_engineering/ — Spatial matching (house/rental ↔ spill sites)
05_data_integration/ — Merging historical and API EDM data
06_analysis_datasets/ — Final dataset assembly

Data Layout

data/raw/          — Original immutable data (EDM, Land Registry, Met Office, shapefiles)
data/processed/    — Intermediate pipeline outputs (parquet)
data/final/        — Analysis-ready datasets
data/cache/        — Postcode geocoding cache

Key Dependencies

R packages managed via renv (renv.lock)
Python environment via uv in scripts/python/
PostGIS via Docker for river network analysis

Workflow

Step 1: Inventory

Read all scripts in scripts/R/ and parse data file references
Read renv.lock for package versions
Scan output/tables/ and output/figures/ for output files
Read the manuscript (docs/overleaf/_main.tex) for table/figure references
Check scripts/python/ for Python dependencies

Step 2: Analyse Dependencies

Parse script dependencies (which scripts create files that others load)
Map the execution order (follows the 6-layer pipeline, then analysis scripts)
Cross-reference the full execution order documented in ReadMe.md

Step 3: Assemble Package

Create in Replication/ (or specified directory):

README.md — AEA format:
- Data availability statement (which data is public vs restricted)
- Computational requirements (R version, packages, PostGIS, Python)
- Program descriptions (what each script does)
- Replication instructions (step-by-step)
- Expected runtime

master.R — Runs everything in order:

# Master replication script for "Sewage in Our Waters"
# Estimated runtime: [X hours]

source(here::here("scripts", "R", "01_data_ingestion", "script.R"))
# ... through all layers
source(here::here("scripts", "R", "09_analysis", "subdir", "script.R"))

install_packages.R — If renv is not used:

install.packages(c("tidyverse", "fixest", "modelsummary", ...))

DEPOSIT_CHECKLIST.md — Pre-deposit verification

Step 4: Validate

Run the 10 verification checks (equivalent to /audit-replication):

Script execution order is correct
All data file references resolve
All output files are generated
Package versions documented
No hardcoded absolute paths
Data provenance documented
README completeness (AEA format)
Output cross-reference (every table/figure traced to a script)
Restricted data properly flagged
Master script runs without modification

Step 5: Present Results

Package contents — All files in Replication/
Script order — Numbered sequence with dependency graph
Data availability — Public vs restricted datasets
Verification result — X/10 checks passed
Deposit steps — openICPSR / Zenodo instructions

Principles

AEA Data Editor standards are the target. README format, versions, data access statements.
Don't rename scripts without approval. Present ordering first, let the user decide.
Thorough data provenance. Every dataset documented with source, access date, and restrictions.
Test before declaring ready. Always validate after assembly.
Document restricted data clearly. Land Registry and Zoopla data may have access restrictions.

Related Skills

brycewang-stanford/literature-review-tools

tools

VerifiedTrustedCommunity

Recommend AND run open-source AI tools, agents, Claude Code / Codex skills, and MCP servers for any stage of a literature review — searching, reading, extracting, synthesizing, screening, citation-checking, and paper writing. Use when the user asks "what tool should I use to..." OR "install/run/use <tool> to ..." for research/lit-review work: automating a survey or related-work section, PDF→Markdown extraction for LLMs (MinerU/marker/docling), PRISMA / systematic review (ASReview), citation-backed Q&A over PDFs (PaperQA2), wiring papers into Claude/Cursor via MCP (arxiv/paper-search/zotero servers), or chatting with a Zotero library. Ships a launcher (scripts/litrun.py) that installs each tool in an isolated venv and runs it. Curated catalog of 70+ vetted projects. 支持中英文（用于「文献综述工具选型」与「一键安装/运行」）。

3,109SKILL.mdUpdated Jul 28, 2026

brycewang-stanford/literature-review-tools

brycewang-stanford/auto-empirical-research-skills

development

VerifiedTrustedCommunity

Route empirical-research requests through the Auto-Empirical Research Skills catalog when this whole repository is installed as one skill in Codex, CodeBuddy, Claude Code, or another IDE. Use to choose and load the right vendored AERS skill for causal inference, econometrics, replication, data acquisition, manuscript writing, peer review and referee responses, citation checking, de-AIGC editing, or full empirical-paper workflows without reading the entire repository at once.

3,109SKILL.mdUpdated Jun 27, 2026

brycewang-stanford/auto-empirical-research-skills

brycewang-stanford/aer-preregistration

documentation

VerifiedTrustedCommunity

Use when the project collects primary data or runs a field, lab, or survey experiment, before the intervention begins — write the pre-analysis plan, size the sample from a power calculation, and register with the AEA RCT Registry. Apply after the design is chosen in aer-identification and before any outcome data are seen.

3,021SKILL.mdUpdated Jul 23, 2026

brycewang-stanford/aer-preregistration

brycewang-stanford/economist-data-skill

tools

VerifiedTrustedCommunity

Guide economists to authoritative data sources with explicit, confirmed data specifications before retrieval; interfaces with Playwright MCP to navigate portals and extract real data, not articles about data.

3,021SKILL.mdUpdated Jul 23, 2026

brycewang-stanford/economist-data-skill

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research.git

# Copy into Claude Code skills folder (global)
cp -r Awesome-Agent-Skills-for-Empirical-Research/skills/41-sticerd-eee-sewage-econometrics-check/skills/data-deposit ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research

1,065 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT