skills/council/alchemist/pipeline-design/SKILL.md
Use when designing data pipelines for moving, transforming, and delivering data. Covers ETL vs ELT pattern selection, orchestration tool choice, batch vs streaming trade-offs, idempotency guarantees, data quality checkpoints, and lineage tracking. Do not use for schema modeling (use schema-evaluation) or ML workflows (use ml-workflow).
npx skillsauth add dtsong/my-claude-setup pipeline-designInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Design data pipelines that reliably move, transform, and deliver data from source systems to consumption layers. Covers ETL vs ELT pattern selection, orchestration tool choice, batch vs streaming trade-offs, idempotency guarantees, data quality checkpoints, and lineage tracking.
Reads pipeline configurations, DAG definitions, orchestration manifests, and infrastructure specs for analysis. Does not execute pipelines, deploy infrastructure, or modify production configurations.
No user-provided values are used in commands or file paths. All inputs are treated as read-only analysis targets.
Document every data flow:
Produce a flow diagram showing all sources, transformations, and destinations.
Evaluate the trade-offs:
Document the chosen pattern per flow and the reasoning.
For each flow, determine the processing mode:
Document latency requirements, cost implications, and complexity trade-offs for the chosen mode.
Ensure every pipeline step is safe to re-run:
Insert quality gates between pipeline stages:
For each checkpoint, define: what is checked, what threshold triggers a failure, and what happens on failure (halt pipeline, alert, quarantine bad records).
Design data lineage tracking:
Specify tooling: dbt lineage, OpenLineage, Datahub, or custom metadata tables.
Choose the orchestration layer based on team and requirements:
Document the choice, alternatives considered, and migration path if the team outgrows the tool.
Compaction resilience: If context was lost during a long session, re-read the Inputs section to reconstruct what system is being analyzed, check the Progress Checklist for completed steps, then resume from the earliest incomplete step.
# Pipeline Design: [Project/Domain Name]
## Flow Diagram
[ASCII diagram showing sources → transformations → destinations]
## Flow Inventory
| Flow | Source | Extraction | Transform | Load Pattern | Volume | Freshness SLA |
|------|--------|-----------|-----------|-------------|--------|---------------|
| ... | ... | ... | ... | ... | ... | ... |
## Architecture Decisions
| Decision | Chosen | Alternatives | Rationale |
|----------|--------|-------------|-----------|
| ETL vs ELT | ... | ... | ... |
| Batch vs Streaming | ... | ... | ... |
| Orchestration tool | ... | ... | ... |
## Idempotency Strategy
| Pipeline Step | Idempotency Method | Re-run Behavior |
|---------------|-------------------|-----------------|
| ... | ... | ... |
## Data Quality Checkpoints
| Stage | Check | Threshold | On Failure |
|-------|-------|-----------|------------|
| Source | ... | ... | ... |
| Transform | ... | ... | ... |
| Destination | ... | ... | ... |
## Lineage and Observability
| Capability | Tool/Method | Coverage |
|-----------|-------------|----------|
| Column lineage | ... | ... |
| Pipeline metrics | ... | ... |
| Alerting | ... | ... |
## Orchestration Design
| DAG/Pipeline | Schedule | Dependencies | SLA |
|-------------|----------|-------------|-----|
| ... | ... | ... | ... |
testing
Use to convert a Word .docx file to PDF and/or verify its page count. Triggers on: converting docx to pdf, rendering a document, checking how many pages a docx produces, or asserting a page-count constraint (e.g. a resume must stay 2 pages). Wraps LibreOffice headless conversion.
development
Security audit checklist for web applications. Use when reviewing, auditing, or hardening a web app's security posture. Covers rate limiting, auth headers, IP blocking, CORS, security middleware, input validation, file upload limits, ORM usage, and password hashing. Triggers on requests like "review security", "harden this app", "security audit", "check for vulnerabilities", or when building/reviewing API endpoints.
development
Interactive wizard to craft effective prompts using Claude Code best practices
tools
Use when batch labeling, prioritizing, and assigning GitHub issues during triage sessions.