skills/jupytext/SKILL.md
Use when working with jupytext — converting notebooks to/from text formats, syncing paired .ipynb/.py files, multi-kernel projects (Python/R/Stata/SAS), or executing notebooks via papermill.
npx skillsauth add edwinhu/workflows jupytextInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Jupytext converts Jupyter notebooks to/from text formats (.py, .R, .md), enabling version control and multi-kernel workflows.
Before claiming ANY jupytext script executed successfully, follow this sequence:
jupytext --to notebook --output - script.py | papermill - output.ipynbThis is non-negotiable. Skipping papermill execution is NOT HELPFUL — the user gets a notebook that fails on first run.
jupyter nbconvert --execute — papermill has better error handling, parameter injection, and logging, and the pipe form needs no intermediate .ipynb files.Before EVERY "notebook works" claim:
Conversion:
Execution (MANDATORY):
jupytext --to notebook --output - script.py | papermill - output.ipynbOutput Verification:
Multi-Kernel Projects (if applicable):
Only after ALL checks pass:
Follow this sequence for EVERY jupytext task involving execution:
1. CONVERT → jupytext --to notebook --output -
2. EXECUTE → papermill - output.ipynb (with params if needed)
3. CHECK → Verify exit code and stderr
4. INSPECT → Use notebook-debug verification
5. VERIFY → Outputs match expectations
6. CLAIM → "Notebook works" only after all gates passed
NEVER skip execution gate. Converting without executing proves nothing about correctness.
Use percent format (py:percent) for all projects:
# %% [markdown]
# # Analysis Title
# %%
import pandas as pd
df = pd.read_csv("data.csv")
# %% tags=["parameters"]
input_file = "data.csv"
Cell markers: # %% for code, # %% [markdown] for markdown.
Markdown dollar signs: Always wrap $ in backticks to prevent LaTeX rendering - # Cost: $50`` not # Cost: $50
Create jupytext.toml in project root:
formats = "ipynb,py:percent"
notebook_metadata_filter = "-all"
cell_metadata_filter = "-all"
# Convert notebook to percent-format Python file
jupytext --to py:percent notebook.ipynb
# Convert Python script to Jupyter notebook format
jupytext --to notebook script.py
# Enable bidirectional pairing to keep formats synchronized
jupytext --set-formats ipynb,py:percent notebook.ipynb
# Synchronize paired notebook and text file
jupytext --sync notebook.ipynb
Always pipe to papermill for execution - no intermediate files:
# Convert script to notebook and execute in atomic operation
jupytext --to notebook --output - script.py | papermill - output.ipynb
# Convert and execute with parameter injection
jupytext --to notebook --output - script.py | papermill - output.ipynb -p start_date "2024-01-01" -p n_samples 1000
# Convert and execute with detailed logging output
jupytext --to notebook --output - script.py | papermill - output.ipynb --log-output
# Convert and execute in memory without saving intermediate files
jupytext --to notebook --output - script.py | papermill - -
Key flags:
--output - tells jupytext to write to stdoutpapermill - output.ipynb reads from stdin, writes to filepapermill - - reads from stdin, writes to stdout (for inspection)Why this pattern:
.ipynb files cluttering the workspaceAfter execution, use notebook-debug skill to inspect tracebacks in the output ipynb.
Share data between Python/R/Stata/SAS via files:
| Route | Format | Write | Read |
|-------|--------|-------|------|
| Python -> R | Parquet | df.to_parquet() | arrow::read_parquet() |
| Python -> Stata | DTA | df.to_stata() | use "file.dta" |
| Any -> Any | CSV | Native | Native |
| SQL queries | DuckDB | Query parquet directly | Query parquet directly |
Python (prep) -> Parquet -> R (stats) -> Parquet -> Python (report)
|
v
Stata (.dta) -> Econometrics
Add the following to .pre-commit-config.yaml:
repos:
- repo: https://github.com/mwouts/jupytext
rev: v1.16.0
hooks:
- id: jupytext
args: [--sync] # Synchronize paired formats before commit
Choose one approach:
*.ipynb to .gitignore) for minimal repository sizeConfigure editors for automatic synchronization:
Standard multi-kernel project layout:
project/
├── jupytext.toml # Project-wide settings
├── environment.yml # Conda env with all kernels
├── notebooks/
│ ├── 01_python_prep.py # Python percent format
│ ├── 02_r_analysis.R # R percent format
│ └── 03_stata_models.do # Stata script
├── data/
│ ├── raw/
│ └── processed/ # Parquet/DTA interchange files
└── results/
Specify kernel in file header:
# ---
# jupyter:
# kernelspec:
# display_name: Python 3
# language: python
# name: python3
# ---
# %% [markdown]
# # Python Analysis
| Issue | Solution |
|-------|----------|
| Sync conflict | Delete .ipynb, regenerate from .py |
| Wrong kernel | Add kernelspec header to .py file |
| Metadata noise | Set notebook_metadata_filter = "-all" |
| Cell order lost | Use percent format (preserves structure) |
Detailed patterns and configurations:
references/formats.md - All format specifications (percent, light, sphinx, myst, rmd, quarto), cell metadata, configuration optionsreferences/kernels.md - Kernel setup (IRkernel, xeus-r, stata_kernel, pystata, saspy), environment configuration, troubleshootingreferences/data-sharing.md - Cross-kernel data sharing patterns (parquet, dta, csv, duckdb), full pipeline examples, validation patternsWorking code in examples/:
examples/python_analysis.py - Python percent-format template with common patternsexamples/r_analysis.R - R percent-format template for statistical analysisexamples/cross_kernel_pipeline.py - Multi-kernel data sharing exampleUtility scripts in scripts/:
scripts/init_project.sh - Initialize jupytext project with standard structurescripts/sync_all.sh - Sync all paired notebooks in projecttools
Use when "query Dewey Data", "deweydata.io", "SafeGraph places/patterns/spend", "Advan foot traffic", "POI / points of interest", "mobility data", "dataplor", "Veraset", "PassBy", "crypto/Bitcoin ATM locations", or any pull from the Dewey Data academic marketplace (UVA/NYU Platform Subscription) via the deweypy/deweydatapy client, DuckDB, or the Dewey MCP server.
development
Use when submitting jobs to UVA HPC (Rivanna/Afton), writing Slurm scripts (sbatch/srun/squeue), converting SGE to Slurm, running compute on any Slurm-managed cluster, or building WRDS data pipelines with polars on HPC. Triggers: 'submit to HPC', 'sbatch', 'squeue', 'slurm job', 'run on Rivanna', 'run on Afton', 'HPC array job', 'convert SGE to Slurm', 'polars on HPC', 'WRDS from HPC'.
testing
Internal skill for literature review and source materialization. Called after brainstorm, before setup. NOT user-facing.
development
This skill should be used when the user asks to "add paper", "paperpile add", "fetch PDF for", "find and add", "search paperpile", "find in paperpile", "paperpile search", "label paper", "trash paper", "download paper", "paperpile index", "edit paper metadata", "update paper title", "fix paper author", "paperpile edit", "find PDF online", "search google for PDF", "resolve PDF", "fetch PDF for citation", "get full-text for DOI", "resolve cite to PDF", or any request to manage their Paperpile library or resolve a citation to a local PDF.