plugins/motherduck-skills-claude/skills/motherduck-build-data-pipeline/SKILL.md
Design an end-to-end MotherDuck data pipeline. Use for ETL/ELT workflows -- choosing raw, staging, and analytics boundaries, bulk ingestion paths, transformation sequencing, dlt/dbt integration, publication targets, or whether DuckLake is actually required.
npx skillsauth add motherduckdb/agent-skills motherduck-build-data-pipelineInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this skill when the user needs an ingestion-to-serving workflow, not just a single load step.
This is a use-case skill. It orchestrates motherduck-connect, motherduck-load-data, motherduck-model-data, motherduck-query, motherduck-share-data, and motherduck-ducklake.
Always determine this first.
Use that discovery to decide whether the pipeline is:
If no server is active, ask for source shape and target shape before drafting the pipeline.
When this skill produces a native DuckDB (md:) connection, watermark it with custom_user_agent=agent-skills/2.2.2(harness-<harness>;llm-<llm>). If metadata is missing, fall back to harness-unknown and llm-unknown.
The output of this skill should be:
If the caller explicitly asks for structured JSON, return raw JSON only with no Markdown fences or prose before/after it. This is mainly for automated tests, regression checks, or downstream tooling that needs a stable machine-readable shape. Normal human-facing use of the skill can stay in prose unless JSON is explicitly requested.
Use this exact top-level shape when JSON is requested:
{
"summary": {},
"assumptions": [],
"implementation_plan": [],
"validation_plan": [],
"risks": []
}
references/dlt-dbt-motherduck-project/ -- fully runnable MotherDuck reference project using dlt, dbt-duckdb, and validation queriesreferences/PIPELINE_IMPLEMENTATION_GUIDE.md -- preserved detailed pipeline guidance that used to live in this skill../motherduck-load-data/references/INGESTION_PATTERNS.md -- lower-level ingestion patternsartifacts/pipeline_stage_example.py -- MotherDuck-backed Python example that stages a Parquet extract, lands it into raw, deduplicates it, and publishes analytics output across raw/staging/analytics databasesartifacts/pipeline_stage_example.ts -- TypeScript companion artifact with the same stage layout and output contractreferences/dlt-dbt-motherduck-project/ -- end-to-end MotherDuck example that bootstraps the target database, lands raw data with dlt, builds staging and analytics models with dbt, and validates the final martRun it with:
uv run --with duckdb python skills/motherduck-build-data-pipeline/artifacts/pipeline_stage_example.py
Run the same stage pattern against temporary MotherDuck databases:
MOTHERDUCK_ARTIFACT_USE_MOTHERDUCK=1 \
uv run --with duckdb python skills/motherduck-build-data-pipeline/artifacts/pipeline_stage_example.py
Validate the TypeScript companion artifact:
uv run scripts/test_typescript_artifacts.py
For the full MotherDuck project:
cd skills/motherduck-build-data-pipeline/references/dlt-dbt-motherduck-project
export MOTHERDUCK_TOKEN=...
export MOTHERDUCK_PIPELINE_DB=md_skills_pipeline_demo
uv sync --python 3.12
uv run python pipeline/run_all.py
uv run python pipeline/cleanup.py
dlt. The motherduck destination does not create the database for you.dbt-duckdb path here was not reliable on Python 3.14.raw, staging, and analytics in dbt, override generate_schema_name.dbt subprocess builds models, run post-build validation in a fresh process or refresh database state before reading new relations.motherduck-connect -- choose the right connection pathmotherduck-load-data -- ingestion mechanicsmotherduck-model-data -- shape the analytics layermotherduck-query -- write transformations and validationsmotherduck-share-data -- publish curated outputsmotherduck-ducklake -- only when open-table-format storage is a real requirementdevelopment
Create, schedule, run, and debug MotherDuck Flights — Python jobs that run on MotherDuck compute. Use whenever someone wants to create a flight, schedule a Python script or recurring job on MotherDuck, set up scheduled ingestion from Postgres, dlt sources, S3, BigQuery, Snowflake, or APIs, refresh aggregates or transformations on a cron, or operate flights with get_flight_guide, create_flight, run_flight, flight logs, secrets, schedules, and versions.
development
Create, schedule, run, and debug MotherDuck Flights — Python jobs that run on MotherDuck compute. Use whenever someone wants to create a flight, schedule a Python script or recurring job on MotherDuck, set up scheduled ingestion from Postgres, dlt sources, S3, BigQuery, Snowflake, or APIs, refresh aggregates or transformations on a cron, or operate flights with get_flight_guide, create_flight, run_flight, flight logs, secrets, schedules, and versions.
data-ai
Create and manage MotherDuck data shares for zero-copy, read-only data distribution. Use whenever someone wants to share a database with team members, another organization, or the public — covers CREATE SHARE, access/visibility/update modes, GRANT READ ON SHARE, attaching share URLs, UPDATE SHARE, and REFRESH DATABASE.
development
Explain MotherDuck security, governance, and access-control patterns. Use for any question about SOC 2, GDPR, compliance, data residency, regions, SSO, service accounts, token handling, tenant isolation, sharing boundaries, snapshots and recovery, or governance posture — including when a security_compliance_owner, technical_owner, or application_builder is evaluating MotherDuck.