Adoption

Agent Skills are supported by leading AI development tools.

VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory VS Code Gemini CLI GitHub Goose Amp Cursor Claude Code Letta OpenCode Claude OpenAI Codex Factory

motherduckdb/motherduck-build-data-pipeline

Name: motherduck-build-data-pipeline
Author: motherduckdb

plugins/motherduck-skills-claude/skills/motherduck-build-data-pipeline/SKILL.md

npx skillsauth add motherduckdb/agent-skills motherduck-build-data-pipeline

Clean

TrivyContainer and dependency vulnerability scanner

Clean

SemgrepStatic code analysis for vulnerabilities

Clean

mcp-scan (Snyk)Model Context Protocol security validation

Skipped

Snyk (dep)Open source security scanning

Skipped

Socket.devSupply chain security analysis

Skipped

VirusTotalMulti-engine malware detection

Skipped

CrowdStrikeAdvanced threat intelligence

Skipped

OSV-ScannerOpen Source Vulnerability database check

Skipped

OWASP Dep-Check

Build a Data Pipeline with MotherDuck

Use this skill when the user needs an ingestion-to-serving workflow, not just a single load step.

This is a use-case skill. It orchestrates motherduck-connect, motherduck-load-data, motherduck-model-data, motherduck-query, motherduck-share-data, and motherduck-ducklake.

Start Here: Is a MotherDuck Server Active?

Always determine this first.

If a remote MotherDuck MCP server or local MotherDuck server is active, use it.
If the user already knows the destination database, confirm it before designing stages.
Explore the live environment:
- current databases and schemas
- raw, staging, and analytics boundaries if they already exist
- source tables, target tables, and table grain
- key columns, date fields, and join keys

Use that discovery to decide whether the pipeline is:

landing into an empty workspace
extending an existing warehouse layout
publishing into an existing analytics model

If no server is active, ask for source shape and target shape before drafting the pipeline.

Use This Skill When

The user needs ingestion plus transformation plus serving output.
The work spans raw landing, curation, and publication.
The user needs a stage-by-stage pipeline pattern rather than one command.
The problem is bigger than a single import step or one ad hoc transformation.

Pipeline Defaults

batch over streaming
raw landing before curation
explicit raw -> staging -> analytics boundaries
bulk ingest paths over row-by-row writes
idempotent stage rebuilds or append contracts before scheduled automation
verify the MotherDuck-supported DuckDB client version before recommending upstream-only write, checkpoint, or lakehouse features
native MotherDuck storage unless DuckLake is explicitly required

Workflow

Confirm whether live MotherDuck discovery is available.
Inspect the current workspace and target data model.
Define raw, staging, and analytics boundaries.
Ingest raw data.
Deduplicate, type, and promote into staging.
Materialize analytics-ready outputs.
Validate counts, freshness, uniqueness, and business metrics before publishing downstream assets.

When this skill produces a native DuckDB (md:) connection, watermark it with custom_user_agent=agent-skills/2.2.2(harness-<harness>;llm-<llm>). If metadata is missing, fall back to harness-unknown and llm-unknown.

Output

The output of this skill should be:

the stage layout
the ingestion method
the transformation sequence
the serving tables or views
the validation checks

If the caller explicitly asks for structured JSON, return raw JSON only with no Markdown fences or prose before/after it. This is mainly for automated tests, regression checks, or downstream tooling that needs a stable machine-readable shape. Normal human-facing use of the skill can stay in prose unless JSON is explicitly requested.

Use this exact top-level shape when JSON is requested:

{
  "summary": {},
  "assumptions": [],
  "implementation_plan": [],
  "validation_plan": [],
  "risks": []
}

References

references/dlt-dbt-motherduck-project/ -- fully runnable MotherDuck reference project using dlt, dbt-duckdb, and validation queries
references/PIPELINE_IMPLEMENTATION_GUIDE.md -- preserved detailed pipeline guidance that used to live in this skill
../motherduck-load-data/references/INGESTION_PATTERNS.md -- lower-level ingestion patterns

Runnable Artifact

artifacts/pipeline_stage_example.py -- MotherDuck-backed Python example that stages a Parquet extract, lands it into raw, deduplicates it, and publishes analytics output across raw/staging/analytics databases
artifacts/pipeline_stage_example.ts -- TypeScript companion artifact with the same stage layout and output contract
references/dlt-dbt-motherduck-project/ -- end-to-end MotherDuck example that bootstraps the target database, lands raw data with dlt, builds staging and analytics models with dbt, and validates the final mart

Run it with:

uv run --with duckdb python skills/motherduck-build-data-pipeline/artifacts/pipeline_stage_example.py

Run the same stage pattern against temporary MotherDuck databases:

MOTHERDUCK_ARTIFACT_USE_MOTHERDUCK=1 \
uv run --with duckdb python skills/motherduck-build-data-pipeline/artifacts/pipeline_stage_example.py

Validate the TypeScript companion artifact:

uv run scripts/test_typescript_artifacts.py

For the full MotherDuck project:

cd skills/motherduck-build-data-pipeline/references/dlt-dbt-motherduck-project
export MOTHERDUCK_TOKEN=...
export MOTHERDUCK_PIPELINE_DB=md_skills_pipeline_demo
uv sync --python 3.12
uv run python pipeline/run_all.py
uv run python pipeline/cleanup.py

Verified Notes

Bootstrap the target MotherDuck database before running dlt. The motherduck destination does not create the database for you.
Keep this stack on Python 3.11 or 3.12 for now. The tested dbt-duckdb path here was not reliable on Python 3.14.
If you want exact schema names like raw, staging, and analytics in dbt, override generate_schema_name.
When a long-lived Python process loads data and a separate dbt subprocess builds models, run post-build validation in a fresh process or refresh database state before reading new relations.

Related Skills

motherduck-connect -- choose the right connection path
motherduck-load-data -- ingestion mechanics
motherduck-model-data -- shape the analytics layer
motherduck-query -- write transformations and validations
motherduck-share-data -- publish curated outputs
motherduck-ducklake -- only when open-table-format storage is a real requirement

motherduckdb/motherduck-build-data-pipeline

plugins/motherduck-skills-claude/skills/motherduck-build-data-pipeline/SKILL.md

Design an end-to-end MotherDuck data pipeline. Use for ETL/ELT workflows -- choosing raw, staging, and analytics boundaries, bulk ingestion paths, transformation sequencing, dlt/dbt integration, publication targets, or whether DuckLake is actually required.

30 stars

development

Updated Jun 11, 2026

$ install --global

skillsauth

npx skillsauth add motherduckdb/agent-skills motherduck-build-data-pipeline

Install this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.

Security Scan Results

3 of 9 scanners reported clean

Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.

Scanners Passed

Scanners in report

Clean

TrivyContainer and dependency vulnerability scanner

95%

Clean

SemgrepStatic code analysis for vulnerabilities

95%

Clean

mcp-scan (Snyk)Model Context Protocol security validation

95%

Skipped

Snyk (dep)Open source security scanning

50%

Skipped

Socket.devSupply chain security analysis

50%

Skipped

VirusTotalMulti-engine malware detection

50%

Skipped

CrowdStrikeAdvanced threat intelligence

50%

Skipped

OSV-ScannerOpen Source Vulnerability database check

50%

Skipped

OWASP Dep-Check

50%

Last scanned: Jun 11, 2026, 2:33 AM324.5s24 files scanned

SKILL.md

name:: motherduck-build-data-pipeline
description:: Design an end-to-end MotherDuck data pipeline. Use for ETL/ELT workflows -- choosing raw, staging, and analytics boundaries, bulk ingestion paths, transformation sequencing, dlt/dbt integration, publication targets, or whether DuckLake is actually required.
argument-hint:: [pipeline-goal]
license:: MIT

Build a Data Pipeline with MotherDuck

Use this skill when the user needs an ingestion-to-serving workflow, not just a single load step.

This is a use-case skill. It orchestrates motherduck-connect, motherduck-load-data, motherduck-model-data, motherduck-query, motherduck-share-data, and motherduck-ducklake.

Start Here: Is a MotherDuck Server Active?

Always determine this first.

If a remote MotherDuck MCP server or local MotherDuck server is active, use it.
If the user already knows the destination database, confirm it before designing stages.
Explore the live environment:
- current databases and schemas
- raw, staging, and analytics boundaries if they already exist
- source tables, target tables, and table grain
- key columns, date fields, and join keys

Use that discovery to decide whether the pipeline is:

landing into an empty workspace
extending an existing warehouse layout
publishing into an existing analytics model

If no server is active, ask for source shape and target shape before drafting the pipeline.

Use This Skill When

The user needs ingestion plus transformation plus serving output.
The work spans raw landing, curation, and publication.
The user needs a stage-by-stage pipeline pattern rather than one command.
The problem is bigger than a single import step or one ad hoc transformation.

Pipeline Defaults

batch over streaming
raw landing before curation
explicit raw -> staging -> analytics boundaries
bulk ingest paths over row-by-row writes
idempotent stage rebuilds or append contracts before scheduled automation
verify the MotherDuck-supported DuckDB client version before recommending upstream-only write, checkpoint, or lakehouse features
native MotherDuck storage unless DuckLake is explicitly required

Workflow

Confirm whether live MotherDuck discovery is available.
Inspect the current workspace and target data model.
Define raw, staging, and analytics boundaries.
Ingest raw data.
Deduplicate, type, and promote into staging.
Materialize analytics-ready outputs.
Validate counts, freshness, uniqueness, and business metrics before publishing downstream assets.

Output

The output of this skill should be:

the stage layout
the ingestion method
the transformation sequence
the serving tables or views
the validation checks

Use this exact top-level shape when JSON is requested:

{
  "summary": {},
  "assumptions": [],
  "implementation_plan": [],
  "validation_plan": [],
  "risks": []
}

References

references/dlt-dbt-motherduck-project/ -- fully runnable MotherDuck reference project using dlt, dbt-duckdb, and validation queries
references/PIPELINE_IMPLEMENTATION_GUIDE.md -- preserved detailed pipeline guidance that used to live in this skill
../motherduck-load-data/references/INGESTION_PATTERNS.md -- lower-level ingestion patterns

Runnable Artifact

artifacts/pipeline_stage_example.py -- MotherDuck-backed Python example that stages a Parquet extract, lands it into raw, deduplicates it, and publishes analytics output across raw/staging/analytics databases
artifacts/pipeline_stage_example.ts -- TypeScript companion artifact with the same stage layout and output contract
references/dlt-dbt-motherduck-project/ -- end-to-end MotherDuck example that bootstraps the target database, lands raw data with dlt, builds staging and analytics models with dbt, and validates the final mart

Run it with:

uv run --with duckdb python skills/motherduck-build-data-pipeline/artifacts/pipeline_stage_example.py

Run the same stage pattern against temporary MotherDuck databases:

MOTHERDUCK_ARTIFACT_USE_MOTHERDUCK=1 \
uv run --with duckdb python skills/motherduck-build-data-pipeline/artifacts/pipeline_stage_example.py

Validate the TypeScript companion artifact:

uv run scripts/test_typescript_artifacts.py

For the full MotherDuck project:

cd skills/motherduck-build-data-pipeline/references/dlt-dbt-motherduck-project
export MOTHERDUCK_TOKEN=...
export MOTHERDUCK_PIPELINE_DB=md_skills_pipeline_demo
uv sync --python 3.12
uv run python pipeline/run_all.py
uv run python pipeline/cleanup.py

Verified Notes

Bootstrap the target MotherDuck database before running dlt. The motherduck destination does not create the database for you.
Keep this stack on Python 3.11 or 3.12 for now. The tested dbt-duckdb path here was not reliable on Python 3.14.
If you want exact schema names like raw, staging, and analytics in dbt, override generate_schema_name.
When a long-lived Python process loads data and a separate dbt subprocess builds models, run post-build validation in a fresh process or refresh database state before reading new relations.

Related Skills

motherduck-connect -- choose the right connection path
motherduck-load-data -- ingestion mechanics
motherduck-model-data -- shape the analytics layer
motherduck-query -- write transformations and validations
motherduck-share-data -- publish curated outputs
motherduck-ducklake -- only when open-table-format storage is a real requirement

Related Skills

motherduckdb/motherduck-create-flight

development

VerifiedTrustedCommunity

Create, schedule, run, and debug MotherDuck Flights — Python jobs that run on MotherDuck compute. Use whenever someone wants to create a flight, schedule a Python script or recurring job on MotherDuck, set up scheduled ingestion from Postgres, dlt sources, S3, BigQuery, Snowflake, or APIs, refresh aggregates or transformations on a cron, or operate flights with get_flight_guide, create_flight, run_flight, flight logs, secrets, schedules, and versions.

30SKILL.mdUpdated Jun 11, 2026

motherduckdb/motherduck-create-flight

motherduckdb/motherduck-create-flight

development

VerifiedTrustedCommunity

30SKILL.mdUpdated Jun 11, 2026

motherduckdb/motherduck-create-flight

motherduckdb/motherduck-share-data

data-ai

VerifiedTrustedCommunity

Create and manage MotherDuck data shares for zero-copy, read-only data distribution. Use whenever someone wants to share a database with team members, another organization, or the public — covers CREATE SHARE, access/visibility/update modes, GRANT READ ON SHARE, attaching share URLs, UPDATE SHARE, and REFRESH DATABASE.

30SKILL.mdUpdated May 7, 2026

motherduckdb/motherduck-share-data

motherduckdb/motherduck-security-governance

development

VerifiedTrustedCommunity

Explain MotherDuck security, governance, and access-control patterns. Use for any question about SOC 2, GDPR, compliance, data residency, regions, SSO, service accounts, token handling, tenant isolation, sharing boundaries, snapshots and recovery, or governance posture — including when a security_compliance_owner, technical_owner, or application_builder is evaluating MotherDuck.

30SKILL.mdUpdated May 7, 2026

motherduckdb/motherduck-security-governance

Download

For Claude Desktop. Download once, then upload the file in the app — no terminal needed.

Need help? View full Cowork setup guide →

Install manually

Choose your platform

# Clone the repo
git clone https://github.com/motherduckdb/agent-skills.git

# Copy into Claude Code skills folder (global)
cp -r agent-skills/plugins/motherduck-skills-claude/skills/motherduck-build-data-pipeline ~/.claude/skills/

Claude Code Skills — official skills path docs.

Repository

motherduckdb/agent-skills

30 stars

Compatible with

Claude Code

OpenAI Codex CLI

ChatGPT