skills/motherduck-build-data-pipeline/SKILL.md
Design an end-to-end MotherDuck pipeline. Use when choosing raw, staging, and analytics boundaries, bulk ingestion paths, transformation sequencing, publication targets, or whether DuckLake is actually required.
npx skillsauth add motherduckdb/agent-skills motherduck-build-data-pipelineInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Use this skill when the user needs an ingestion-to-serving workflow, not just a single load step.
This is a use-case skill. It orchestrates motherduck-connect, motherduck-load-data, motherduck-model-data, motherduck-query, motherduck-share-data, and motherduck-ducklake.
Always determine this first.
Use that discovery to decide whether the pipeline is:
If no server is active, ask for source shape and target shape before drafting the pipeline.
When this skill produces a native DuckDB (md:) connection, watermark it with custom_user_agent=agent-skills/2.2.0(harness-<harness>;llm-<llm>). If metadata is missing, fall back to harness-unknown and llm-unknown.
The output of this skill should be:
If the caller explicitly asks for structured JSON, return raw JSON only with no Markdown fences or prose before/after it. This is mainly for automated tests, regression checks, or downstream tooling that needs a stable machine-readable shape. Normal human-facing use of the skill can stay in prose unless JSON is explicitly requested.
Use this exact top-level shape when JSON is requested:
{
"summary": {},
"assumptions": [],
"implementation_plan": [],
"validation_plan": [],
"risks": []
}
references/dlt-dbt-motherduck-project/ -- fully runnable MotherDuck reference project using dlt, dbt-duckdb, and validation queriesreferences/PIPELINE_IMPLEMENTATION_GUIDE.md -- preserved detailed pipeline guidance that used to live in this skill../motherduck-load-data/references/INGESTION_PATTERNS.md -- lower-level ingestion patternsartifacts/pipeline_stage_example.py -- MotherDuck-backed Python example that stages a Parquet extract, lands it into raw, deduplicates it, and publishes analytics output across raw/staging/analytics databasesartifacts/pipeline_stage_example.ts -- TypeScript companion artifact with the same stage layout and output contractreferences/dlt-dbt-motherduck-project/ -- end-to-end MotherDuck example that bootstraps the target database, lands raw data with dlt, builds staging and analytics models with dbt, and validates the final martRun it with:
uv run --with duckdb python skills/motherduck-build-data-pipeline/artifacts/pipeline_stage_example.py
Run the same stage pattern against temporary MotherDuck databases:
MOTHERDUCK_ARTIFACT_USE_MOTHERDUCK=1 \
uv run --with duckdb python skills/motherduck-build-data-pipeline/artifacts/pipeline_stage_example.py
Validate the TypeScript companion artifact:
uv run scripts/test_typescript_artifacts.py
For the full MotherDuck project:
cd skills/motherduck-build-data-pipeline/references/dlt-dbt-motherduck-project
export MOTHERDUCK_TOKEN=...
export MOTHERDUCK_PIPELINE_DB=md_skills_pipeline_demo
uv sync --python 3.12
uv run python pipeline/run_all.py
uv run python pipeline/cleanup.py
dlt. The motherduck destination does not create the database for you.dbt-duckdb path here was not reliable on Python 3.14.raw, staging, and analytics in dbt, override generate_schema_name.dbt subprocess builds models, run post-build validation in a fresh process or refresh database state before reading new relations.motherduck-connect -- choose the right connection pathmotherduck-load-data -- ingestion mechanicsmotherduck-model-data -- shape the analytics layermotherduck-query -- write transformations and validationsmotherduck-share-data -- publish curated outputsmotherduck-ducklake -- only when open-table-format storage is a real requirementdevelopment
Connect to MotherDuck from any application. Use when setting up database connectivity via the Postgres endpoint (recommended), pg_duckdb, native DuckDB API, or JDBC. Covers connection strings, authentication, SSL, and environment variable configuration.
development
Connect to MotherDuck from any application. Use when setting up database connectivity via the Postgres endpoint (recommended), pg_duckdb, native DuckDB API, or JDBC. Covers connection strings, authentication, SSL, and environment variable configuration.
development
Connect to MotherDuck from any application. Use when setting up database connectivity via the Postgres endpoint (recommended), pg_duckdb, native DuckDB API, or JDBC. Covers connection strings, authentication, SSL, and environment variable configuration.
data-ai
Create and manage MotherDuck data shares for zero-copy data distribution. Use when sharing databases with team members, other organizations, or making data publicly available.