skills/43-wentorai-research-plugins/skills/tools/code-exec/python-reproducibility-guide/SKILL.md
Reproducible Python environments, notebooks, and literate programming
npx skillsauth add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research python-reproducibility-guideInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Set up reproducible Python environments for research computing, using virtual environments, dependency management, Jupyter notebooks, and literate programming practices.
# Option 1: venv (built-in, lightweight)
python -m venv .venv
source .venv/bin/activate # macOS/Linux
# .venv\Scripts\activate # Windows
pip install -r requirements.txt
# Option 2: conda (includes non-Python dependencies)
conda create -n myproject python=3.11
conda activate myproject
conda install numpy pandas scipy matplotlib
conda env export > environment.yml
# Option 3: uv (fast, modern Python package manager)
uv venv
source .venv/bin/activate
uv pip install -r requirements.txt
# requirements.txt with exact versions (pip freeze)
pip freeze > requirements.txt
# Better: use pip-tools for compiled dependencies
pip install pip-tools
# Create requirements.in (human-readable, loose constraints)
cat > requirements.in << 'EOF'
numpy>=1.24
pandas>=2.0
scipy>=1.11
matplotlib>=3.7
scikit-learn>=1.3
EOF
# Compile to requirements.txt (pinned, reproducible)
pip-compile requirements.in --output-file requirements.txt
# Install from compiled requirements
pip-sync requirements.txt
[project]
name = "my-research-project"
version = "0.1.0"
description = "Analysis code for paper: Title"
requires-python = ">=3.10"
dependencies = [
"numpy>=1.24",
"pandas>=2.0",
"scipy>=1.11",
"matplotlib>=3.7",
"scikit-learn>=1.3",
"statsmodels>=0.14",
]
[project.optional-dependencies]
dev = ["pytest", "black", "ruff", "jupyter"]
gpu = ["torch>=2.0", "torchvision"]
[tool.ruff]
line-length = 88
select = ["E", "F", "I"]
# Cell 1: Imports and configuration (always the first cell)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path
# Configuration
DATA_DIR = Path("./data")
OUTPUT_DIR = Path("./outputs")
OUTPUT_DIR.mkdir(exist_ok=True)
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)
# Matplotlib defaults
plt.rcParams.update({
"figure.figsize": (10, 6),
"figure.dpi": 150,
"font.size": 12,
"axes.spines.top": False,
"axes.spines.right": False,
})
print(f"NumPy: {np.__version__}")
print(f"Pandas: {pd.__version__}")
# Paper Title: Analysis Notebook
## 1. Setup and Data Loading
[Import libraries, set seeds, load data]
## 2. Data Exploration
[Summary statistics, distributions, missing data check]
## 3. Preprocessing
[Cleaning, transformation, feature engineering]
## 4. Analysis
### 4.1 Primary Analysis
[Main statistical tests or model training]
### 4.2 Sensitivity Analysis
[Robustness checks]
### 4.3 Supplementary Analysis
[Additional analyses for appendix]
## 5. Visualization
[Publication-quality figures]
## 6. Export Results
[Save tables, figures, and summary statistics]
# Convert notebook to Python script
jupyter nbconvert --to script analysis.ipynb
# Convert notebook to HTML report
jupyter nbconvert --to html --no-input analysis.ipynb
# Convert notebook to PDF
jupyter nbconvert --to pdf analysis.ipynb
# Execute notebook from command line (and save output)
jupyter nbconvert --execute --to notebook --inplace analysis.ipynb
import numpy as np
import random
import os
def set_global_seed(seed=42):
"""Set random seeds for full reproducibility."""
random.seed(seed)
np.random.seed(seed)
os.environ["PYTHONHASHSEED"] = str(seed)
# PyTorch (if used)
try:
import torch
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
except ImportError:
pass
# TensorFlow (if used)
try:
import tensorflow as tf
tf.random.set_seed(seed)
except ImportError:
pass
set_global_seed(42)
FROM python:3.11-slim
WORKDIR /app
# System dependencies
RUN apt-get update && apt-get install -y \
build-essential \
git \
&& rm -rf /var/lib/apt/lists/*
# Python dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy project code
COPY . .
# Default: run the analysis
CMD ["python", "run_analysis.py"]
# Build and run
docker build -t my-analysis .
docker run -v $(pwd)/data:/app/data -v $(pwd)/outputs:/app/outputs my-analysis
# Interactive Jupyter inside Docker
docker run -p 8888:8888 -v $(pwd):/app my-analysis \
jupyter notebook --ip=0.0.0.0 --allow-root --no-browser
research-project/
├── README.md # Project overview and how to reproduce
├── pyproject.toml # Dependencies and project metadata
├── requirements.txt # Pinned dependencies
├── Dockerfile # Containerized environment
├── Makefile # Automation (make data, make analysis, make figures)
├── data/
│ ├── raw/ # Original, immutable data
│ ├── processed/ # Cleaned, transformed data
│ └── external/ # Third-party data sources
├── notebooks/
│ ├── 01_exploration.ipynb # Data exploration
│ ├── 02_analysis.ipynb # Main analysis
│ └── 03_figures.ipynb # Publication figures
├── src/
│ ├── __init__.py
│ ├── data.py # Data loading and preprocessing
│ ├── models.py # Statistical models and ML
│ ├── visualization.py # Plotting functions
│ └── utils.py # Shared utilities
├── tests/
│ ├── test_data.py # Data pipeline tests
│ └── test_models.py # Model correctness tests
├── outputs/
│ ├── figures/ # Generated figures (PDF, PNG)
│ ├── tables/ # Generated tables (CSV, LaTeX)
│ └── models/ # Saved model artifacts
└── configs/
├── experiment_1.yaml # Experiment configuration
└── experiment_2.yaml # Experiment configuration
.PHONY: all data analysis figures clean
all: data analysis figures
data:
python src/data.py --input data/raw/ --output data/processed/
analysis: data
python -m jupyter nbconvert --execute notebooks/02_analysis.ipynb \
--to notebook --inplace
figures: analysis
python src/visualization.py --output outputs/figures/
clean:
rm -rf data/processed/ outputs/
# Reproduce the full pipeline from scratch
reproduce: clean all
@echo "All results reproduced successfully."
# Run tests
test:
pytest tests/ -v
# Format code
format:
ruff check --fix src/ tests/
ruff format src/ tests/
import logging
from datetime import datetime
# Set up logging
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(levelname)s] %(message)s",
handlers=[
logging.FileHandler(f"outputs/logs/run_{datetime.now():%Y%m%d_%H%M%S}.log"),
logging.StreamHandler()
]
)
logger = logging.getLogger(__name__)
# Log experiment parameters
logger.info(f"Random seed: {RANDOM_SEED}")
logger.info(f"Data file: {DATA_DIR / 'dataset.csv'}")
logger.info(f"Model: Linear Regression with L2 regularization (alpha=0.1)")
logger.info(f"Train/test split: 80/20")
requirements.txt or pyproject.tomlmake all or python run_analysis.py)development
Conduct rigorous thematic analysis (TA) of qualitative data following Braun and Clarke's (2006) six-phase framework. Use whenever the user mentions 'thematic analysis', 'TA', 'Braun and Clarke', 'qualitative coding', 'identifying themes', or asks for help analysing interviews, focus groups, open-ended survey responses, or transcripts to identify patterns. Also trigger for questions about inductive vs theoretical coding, semantic vs latent themes, essentialist vs constructionist epistemology, building a thematic map, or writing up a qualitative findings section. Covers all six phases, the four upfront analytic decisions, the 15-point quality checklist, and the five common pitfalls. Produces a Word document write-up and an annotated thematic map. Does NOT cover IPA, grounded theory, discourse analysis, conversation analysis, or narrative analysis — use a different method for those.
development
Guide users through writing a systematic literature review (SLR) following the PRISMA 2020 framework. Use this skill whenever the user mentions 'systematic review', 'systematic literature review', 'SLR', 'PRISMA', 'PRISMA 2020', 'PRISMA flow diagram', 'PRISMA checklist', or asks for help writing, structuring, or auditing a literature review that follows reporting guidelines. Also trigger when the user asks about inclusion/exclusion criteria for a review, search strategies for databases like Scopus/WoS/PubMed, study selection processes, risk of bias assessment, or narrative synthesis for a review paper. This skill covers the full PRISMA 2020 checklist (27 items), produces a Word document manuscript in strict journal article format, generates an annotated PRISMA flow diagram, and enforces APA 7th Edition referencing throughout. It does NOT cover meta-analysis or statistical pooling. By Chuah Kee Man.
testing
Performs placebo-in-time sensitivity analysis with hierarchical null model and optional Bayesian assurance. Use when checking model robustness, verifying lack of pre-intervention effects, or estimating study power.
data-ai
Fit, summarize, plot, and interpret a chosen CausalPy experiment. Use after the causal method has been selected, including when configuring PyMC/sklearn models and scale-aware custom priors.