claude/skills/data-pipeline/SKILL.md
Data pipeline architecture patterns and best practices, including medallion/three-layer architecture (Raw/Staging/Enriched or Bronze/Silver/Gold), YAML-based schema management, and ETL workflow patterns. Use when designing or implementing data pipelines, working with data warehouse layers, or managing table schemas in YAML.
npx skillsauth add lanej/dotfiles data-pipelineInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Architecture patterns and best practices for data pipeline design, with focus on the medallion architecture (Raw → Staging → Enriched) and declarative schema management.
The three-layer pattern for organizing data warehouses:
See references/medallion-architecture.md for:
Prefer YAML files over inline SQL DDL or JSON for schema definitions:
Benefits:
See references/yaml-schema-patterns.md for:
jsonencode(yamldecode(file()))| Need | Layer | |------|-------| | Historical source data, re-processable | Raw | | Cleaned records, ready for joining | Staging | | Joined across sources, with metrics | Enriched |
| Change | Safe? | Notes | |--------|-------|-------| | Add nullable column | ✅ Yes | No data loss | | Rename column | ❌ No | Use view-layer aliasing | | Delete column | ❌ No | Deprecate first, delete later | | Widen type (INT→FLOAT) | ✅ Yes | Usually safe | | Narrow type (FLOAT→INT) | ❌ No | Requires migration |
ROW_NUMBER() window functions in Staging layer, not DISTINCT (DISTINCT is non-deterministic for ties)jsonencode(yamldecode()) for BigQuery, field order in YAML must match BQ (see opentofu skill)devops
DORA engineering metrics project at ~/src/dora. Load when: querying DORA BigQuery views (deployment frequency, lead time, change failure rate, alerts, review time) from any project; joining against DORA.unified_identity or DORA_clean.* views from any project; running the data pipeline (just refresh, just download-*, just upload-*); making OpenTofu infrastructure changes to DORA tables or views; working with team attribution, team identity, or engineer roster data.
data-ai
Delegate research and context-gathering tasks to a sub-agent to protect the primary context window. Use when the user asks to "research X", "look into X", "find out about X", "gather context on X", or any investigative framing where answering requires 2+ searches or multiple sources. Also use proactively before starting substantive work when prior context is unknown. Never run research inline — always delegate.
documentation
--- name: qmd-math description: Math notation conventions for Quarto/EPQ documents rendered via lualatex. Use when: writing or adding a formula, equation, or mathematical expression to a .qmd file; asked about display math, inline math, or LaTeX notation in a QMD/Quarto context; defining a where-clause or variable definitions for an equation; converting prose variable descriptions into structured math notation; fixing math that renders badly in a PDF; using \lvert, \begin{aligned}, \tfrac, \text
development
Trim a prose document (README, design doc, blog post, notes) for readability by cutting redundancy, filler, and dead weight in the author's own words. Invoke with /trim [file path], or /trim alone to be prompted for a file. Not for source code, data files, or summarization.