Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

lanej/data-pipeline

Name: data-pipeline
Author: lanej

claude/skills/data-pipeline/SKILL.md

npx skillsauth add lanej/dotfiles data-pipeline

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Data Pipeline Patterns

Architecture patterns and best practices for data pipeline design, with focus on the medallion architecture (Raw → Staging → Enriched) and declarative schema management.

Medallion Architecture

The three-layer pattern for organizing data warehouses:

Raw / Bronze: Preserve source data exactly as received (append-only)
Staging / Silver: Cleaned, deduplicated, conformed data
Enriched / Gold: Business-ready, joined, aggregated data

See references/medallion-architecture.md for:

Layer definitions and purposes
Deduplication patterns
View vs materialized table choices
Temporal joins for slowly-changing dimensions
Refresh strategies

YAML Schema Management

Prefer YAML files over inline SQL DDL or JSON for schema definitions:

Benefits:

Version control friendly (line-by-line diffs)
Human readable with comments
Separation of concerns (schema vs infrastructure)
Consistent across upload scripts, docs, and IaC

See references/yaml-schema-patterns.md for:

YAML structure examples
Terraform integration via jsonencode(yamldecode(file()))
Schema evolution patterns (safe vs unsafe changes)
Field naming conventions
Deprecation workflow
Multi-source schema strategies

Quick Reference

Layer Selection

| Need | Layer | |------|-------| | Historical source data, re-processable | Raw | | Cleaned records, ready for joining | Staging | | Joined across sources, with metrics | Enriched |

Schema Change Safety

| Change | Safe? | Notes | |--------|-------|-------| | Add nullable column | ✅ Yes | No data loss | | Rename column | ❌ No | Use view-layer aliasing | | Delete column | ❌ No | Deprecate first, delete later | | Widen type (INT→FLOAT) | ✅ Yes | Usually safe | | Narrow type (FLOAT→INT) | ❌ No | Requires migration |

Common Gotchas

Deduplication: Always use ROW_NUMBER() window functions in Staging layer, not DISTINCT (DISTINCT is non-deterministic for ties)
Field order: When using Terraform jsonencode(yamldecode()) for BigQuery, field order in YAML must match BQ (see opentofu skill)
Extraction idempotency: Make extract scripts safe to re-run (upsert logic, deduplication in Raw layer)

lanej/data-pipeline

claude/skills/data-pipeline/SKILL.md

Data pipeline architecture patterns and best practices, including medallion/three-layer architecture (Raw/Staging/Enriched or Bronze/Silver/Gold), YAML-based schema management, and ETL workflow patterns. Use when designing or implementing data pipelines, working with data warehouse layers, or managing table schemas in YAML.

39 stars

development

Updated Jun 12, 2026

$ install --global

skillsauth

npx skillsauth add lanej/dotfiles data-pipeline

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 12, 2026, 7:45 AM35.7s3 files scanned

SKILL.md

name:: data-pipeline
description:: Data pipeline architecture patterns and best practices, including medallion/three-layer architecture (Raw/Staging/Enriched or Bronze/Silver/Gold), YAML-based schema management, and ETL workflow patterns. Use when designing or implementing data pipelines, working with data warehouse layers, or managing table schemas in YAML.

Data Pipeline Patterns

Architecture patterns and best practices for data pipeline design, with focus on the medallion architecture (Raw → Staging → Enriched) and declarative schema management.

Medallion Architecture

The three-layer pattern for organizing data warehouses:

Raw / Bronze: Preserve source data exactly as received (append-only)
Staging / Silver: Cleaned, deduplicated, conformed data
Enriched / Gold: Business-ready, joined, aggregated data

See references/medallion-architecture.md for:

Layer definitions and purposes
Deduplication patterns
View vs materialized table choices
Temporal joins for slowly-changing dimensions
Refresh strategies

YAML Schema Management

Prefer YAML files over inline SQL DDL or JSON for schema definitions:

Benefits:

Version control friendly (line-by-line diffs)
Human readable with comments
Separation of concerns (schema vs infrastructure)
Consistent across upload scripts, docs, and IaC

See references/yaml-schema-patterns.md for:

YAML structure examples
Terraform integration via jsonencode(yamldecode(file()))
Schema evolution patterns (safe vs unsafe changes)
Field naming conventions
Deprecation workflow
Multi-source schema strategies

Quick Reference

Layer Selection

| Need | Layer | |------|-------| | Historical source data, re-processable | Raw | | Cleaned records, ready for joining | Staging | | Joined across sources, with metrics | Enriched |

Schema Change Safety

Common Gotchas

Deduplication: Always use ROW_NUMBER() window functions in Staging layer, not DISTINCT (DISTINCT is non-deterministic for ties)
Field order: When using Terraform jsonencode(yamldecode()) for BigQuery, field order in YAML must match BQ (see opentofu skill)
Extraction idempotency: Make extract scripts safe to re-run (upsert logic, deduplication in Raw layer)

Related Skills

lanej/dora

devops

VerifiedTrustedCommunity

DORA engineering metrics project at ~/src/dora. Load when: querying DORA BigQuery views (deployment frequency, lead time, change failure rate, alerts, review time) from any project; joining against DORA.unified_identity or DORA_clean.* views from any project; running the data pipeline (just refresh, just download-*, just upload-*); making OpenTofu infrastructure changes to DORA tables or views; working with team attribution, team identity, or engineer roster data.

39SKILL.mdUpdated Jun 12, 2026

lanej/research

data-ai

VerifiedTrustedCommunity

Delegate research and context-gathering tasks to a sub-agent to protect the primary context window. Use when the user asks to "research X", "look into X", "find out about X", "gather context on X", or any investigative framing where answering requires 2+ searches or multiple sources. Also use proactively before starting substantive work when prior context is unknown. Never run research inline — always delegate.

39SKILL.mdUpdated May 31, 2026

lanej/claude/skills/qmd-math

documentation

VerifiedTrustedCommunity

--- name: qmd-math description: Math notation conventions for Quarto/EPQ documents rendered via lualatex. Use when: writing or adding a formula, equation, or mathematical expression to a .qmd file; asked about display math, inline math, or LaTeX notation in a QMD/Quarto context; defining a where-clause or variable definitions for an equation; converting prose variable descriptions into structured math notation; fixing math that renders badly in a PDF; using \lvert, \begin{aligned}, \tfrac, \text

39SKILL.mdUpdated May 26, 2026

lanej/claude/skills/qmd-math

lanej/trim

development

VerifiedTrustedCommunity

Trim a prose document (README, design doc, blog post, notes) for readability by cutting redundancy, filler, and dead weight in the author's own words. Invoke with /trim [file path], or /trim alone to be prompted for a file. Not for source code, data files, or summarization.

39SKILL.mdUpdated May 24, 2026

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/lanej/dotfiles.git

# Copy into Claude Code skills folder (global)
cp -r dotfiles/claude/skills/data-pipeline ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

lanej/dotfiles

39 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT