Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

vibeeval/data-pipeline-patterns

Name: data-pipeline-patterns
Author: vibeeval

skills/data-pipeline-patterns/SKILL.md

npx skillsauth add vibeeval/vibecosystem data-pipeline-patterns

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Data Pipeline Patterns

ETL vs ELT Decision

| Kriter | ETL | ELT | |--------|-----|-----| | Transform location | Pipeline'da | Data warehouse'da | | Data volume | Küçük-orta | Büyük | | Flexibility | Düşük | Yüksek | | Cost | Compute-heavy | Storage-heavy | | Use case | Legacy, compliance | Modern analytics |

Batch vs Streaming

| Kriter | Batch | Streaming | |--------|-------|-----------| | Latency | Dakika-saat | Saniye-milisaniye | | Complexity | Düşük | Yüksek | | Cost | Düşük | Yüksek | | Use case | Reporting, ETL | Real-time alerts, dashboards | | Tool | Airflow, dbt | Kafka Streams, Flink |

Idempotency Patterns

# Pattern 1: Upsert
INSERT INTO target (id, name, updated_at)
VALUES (%(id)s, %(name)s, %(ts)s)
ON CONFLICT (id) DO UPDATE SET
  name = EXCLUDED.name,
  updated_at = EXCLUDED.updated_at

# Pattern 2: Partition overwrite
DELETE FROM target WHERE partition_date = '2026-03-14';
INSERT INTO target SELECT * FROM staging WHERE partition_date = '2026-03-14';

# Pattern 3: Checkpoint
last_checkpoint = get_checkpoint('pipeline_x')
new_data = source.query(f"WHERE updated_at > '{last_checkpoint}'")
process(new_data)
save_checkpoint('pipeline_x', max(new_data.updated_at))

Data Quality Framework

import pandera as pa

schema = pa.DataFrameSchema({
    "user_id": pa.Column(int, pa.Check.gt(0), nullable=False),
    "email": pa.Column(str, pa.Check.str_matches(r'^.+@.+\..+$')),
    "age": pa.Column(int, pa.Check.in_range(0, 150), nullable=True),
    "created_at": pa.Column(pa.DateTime, pa.Check.less_than_or_equal_to(pd.Timestamp.now()))
})

validated_df = schema.validate(df)  # Fail on invalid data

Quality Dimensions

| Dimension | Kontrol | Tool | |-----------|---------|------| | Completeness | NULL ratio < threshold | Great Expectations | | Accuracy | Value range checks | pandera | | Freshness | Last update < SLA | Airflow sensor | | Uniqueness | Duplicate check | SQL DISTINCT | | Consistency | Cross-table referential integrity | dbt test |

Pipeline Orchestration

# Airflow DAG
from airflow import DAG
from airflow.operators.python import PythonOperator

with DAG('daily_etl', schedule='0 6 * * *', catchup=False) as dag:
    extract = PythonOperator(task_id='extract', python_callable=extract_fn)
    transform = PythonOperator(task_id='transform', python_callable=transform_fn)
    load = PythonOperator(task_id='load', python_callable=load_fn)
    validate = PythonOperator(task_id='validate', python_callable=validate_fn)

    extract >> transform >> load >> validate

Checklist

[ ] Pipeline idempotent (rerun safe)
[ ] Data quality checks her adımda
[ ] Dead letter queue (failed records)
[ ] Monitoring + alerting aktif
[ ] Schema evolution handled
[ ] Backfill mekanizması var
[ ] Retry logic (exponential backoff)
[ ] Data lineage tracked

Anti-Patterns

Pipeline'da hardcoded credentials
Idempotent olmayan transform
Data quality check'siz load
Monolithic pipeline (parçala)
Silent failure (error swallowing)

vibeeval/data-pipeline-patterns

skills/data-pipeline-patterns/SKILL.md

ETL/ELT patterns, batch vs streaming, idempotency, data quality framework, and pipeline orchestration

465 stars

development

Updated Apr 16, 2026

$ install --global

skillsauth

npx skillsauth add vibeeval/vibecosystem data-pipeline-patterns

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Apr 16, 2026, 1:49 AM8.6s1 file scanned

SKILL.md

name:: data-pipeline-patterns
description:: ETL/ELT patterns, batch vs streaming, idempotency, data quality framework, and pipeline orchestration

Data Pipeline Patterns

ETL vs ELT Decision

Batch vs Streaming

Idempotency Patterns

# Pattern 1: Upsert
INSERT INTO target (id, name, updated_at)
VALUES (%(id)s, %(name)s, %(ts)s)
ON CONFLICT (id) DO UPDATE SET
  name = EXCLUDED.name,
  updated_at = EXCLUDED.updated_at

# Pattern 2: Partition overwrite
DELETE FROM target WHERE partition_date = '2026-03-14';
INSERT INTO target SELECT * FROM staging WHERE partition_date = '2026-03-14';

# Pattern 3: Checkpoint
last_checkpoint = get_checkpoint('pipeline_x')
new_data = source.query(f"WHERE updated_at > '{last_checkpoint}'")
process(new_data)
save_checkpoint('pipeline_x', max(new_data.updated_at))

Data Quality Framework

import pandera as pa

schema = pa.DataFrameSchema({
    "user_id": pa.Column(int, pa.Check.gt(0), nullable=False),
    "email": pa.Column(str, pa.Check.str_matches(r'^.+@.+\..+$')),
    "age": pa.Column(int, pa.Check.in_range(0, 150), nullable=True),
    "created_at": pa.Column(pa.DateTime, pa.Check.less_than_or_equal_to(pd.Timestamp.now()))
})

validated_df = schema.validate(df)  # Fail on invalid data

Quality Dimensions

Pipeline Orchestration

# Airflow DAG
from airflow import DAG
from airflow.operators.python import PythonOperator

with DAG('daily_etl', schedule='0 6 * * *', catchup=False) as dag:
    extract = PythonOperator(task_id='extract', python_callable=extract_fn)
    transform = PythonOperator(task_id='transform', python_callable=transform_fn)
    load = PythonOperator(task_id='load', python_callable=load_fn)
    validate = PythonOperator(task_id='validate', python_callable=validate_fn)

    extract >> transform >> load >> validate

Checklist

[ ] Pipeline idempotent (rerun safe)
[ ] Data quality checks her adımda
[ ] Dead letter queue (failed records)
[ ] Monitoring + alerting aktif
[ ] Schema evolution handled
[ ] Backfill mekanizması var
[ ] Retry logic (exponential backoff)
[ ] Data lineage tracked

Anti-Patterns

Pipeline'da hardcoded credentials
Idempotent olmayan transform
Data quality check'siz load
Monolithic pipeline (parçala)
Silent failure (error swallowing)

Related Skills

vibeeval/workflow-router

development

VerifiedTrustedCommunity

Goal-based workflow orchestration - routes tasks to specialist agents based on user goals

500SKILL.mdUpdated Jun 11, 2026

vibeeval/workflow-router

vibeeval/wiring

tools

VerifiedTrustedCommunity

Wiring Verification

500SKILL.mdUpdated Jun 11, 2026

vibeeval/websocket-patterns

development

VerifiedTrustedCommunity

Connection management, room patterns, reconnection strategies, message buffering, and binary protocol design.

500SKILL.mdUpdated Jun 11, 2026

vibeeval/websocket-patterns

vibeeval/vp-engineering

testing

VerifiedTrustedCommunity

VP Engineering perspective - org design (team topologies), process improvement, cross-team dependencies, engineering culture, OKRs, incident management maturity, platform strategy, DX optimization, release management at scale

500SKILL.mdUpdated Jun 11, 2026

vibeeval/vp-engineering

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/vibeeval/vibecosystem.git

# Copy into Claude Code skills folder (global)
cp -r vibecosystem/skills/data-pipeline-patterns ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

vibeeval/vibecosystem

465 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT