distributions/codex/skills/data-pipeline-architect/SKILL.md
Designs ETL/ELT data pipelines with proper extraction, transformation, and loading patterns, including orchestration, error handling, and data quality validation.
npx skillsauth add a-organvm/a-i--skills data-pipeline-architectInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
This skill provides guidance for designing robust, scalable data pipelines that move data reliably from sources to destinations.
To begin pipeline design, gather:
Batch Pipelines - For periodic bulk processing:
Streaming Pipelines - For real-time requirements:
Hybrid Approaches - Lambda or Kappa architecture:
ETL (Transform before Load):
ELT (Transform after Load):
Extraction Layer:
Transformation Layer:
Loading Layer:
┌─────────────────────────────────────────────────────────┐
│ Pipeline Execution │
├─────────────────────────────────────────────────────────┤
│ ┌─────────┐ ┌───────────┐ ┌──────────┐ │
│ │ Extract │───▶│ Transform │───▶│ Load │ │
│ └────┬────┘ └─────┬─────┘ └────┬─────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────┐ ┌───────────┐ ┌──────────┐ │
│ │ Retry │ │ Dead Letter│ │ Rollback │ │
│ │ w/Backoff│ │ Queue │ │ Checkpoint│ │
│ └─────────┘ └───────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────┘
Implement checks at each stage:
| Stage | Check Type | Example | |-------|------------|---------| | Extract | Completeness | Row count matches source | | Extract | Freshness | Data timestamp within SLA | | Transform | Validity | Values in expected ranges | | Transform | Uniqueness | Primary keys unique | | Load | Reconciliation | Target matches source totals | | Load | Integrity | Foreign keys valid |
Essential metrics to track:
Alert on:
-- Timestamp-based incremental
SELECT * FROM source
WHERE updated_at > {{ last_run_timestamp }}
-- CDC-based (Change Data Capture)
-- Captures inserts, updates, deletes from transaction log
-- Delete + Insert pattern
DELETE FROM target WHERE date_partition = '2024-01-15';
INSERT INTO target SELECT * FROM staging WHERE date_partition = '2024-01-15';
-- Merge/Upsert pattern
MERGE INTO target t
USING staging s ON t.id = s.id
WHEN MATCHED THEN UPDATE SET ...
WHEN NOT MATCHED THEN INSERT ...
references/orchestration-patterns.md - Airflow, Dagster, Prefect patternsreferences/data-quality-checks.md - Validation frameworks and rulesreferences/pipeline-templates.md - Common pipeline architecturestesting
Designs systems for encoding, scoring, and generating choreographic movement using Laban notation, computational geometry, and procedural animation principles.
tools
Manage monorepos and multi-package repositories with workspace tools, dependency management, selective builds, and change detection. Covers npm/pnpm workspaces, Turborepo, and Python monorepo patterns. Triggers on monorepo setup, workspace management, or multi-package repository requests.
development
Curated bundle for managing monorepos with containerized deployment pipelines. Includes monorepo management, Docker containerization, CI/CD deployment, and coding standards. Use when setting up or improving multi-package repository infrastructure.
development
Apply modular synthesis principles to system design, workflow architecture, and conceptual frameworks. Use when designing modular systems, creating architecture diagrams using synthesis metaphors, applying signal flow thinking to data pipelines, or translating between audio engineering and software concepts. Triggers on modular architecture design, signal flow diagrams, synthesis-inspired system thinking, or "oscillator/patch" metaphors.