skills/data-analytics/ecommerce-data-warehouse/SKILL.md
Build a commerce data warehouse with star-schema tables, ETL pipelines, and dbt models for BigQuery, Snowflake, or Redshift analytics
npx skillsauth add finsilabs/awesome-ecommerce-skills ecommerce-data-warehouseInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
An ecommerce data warehouse centralizes data from your store, ad platforms, shipping carriers, and accounting system into a single analytics layer — enabling reports that no single platform can produce on its own. Most merchants start needing a warehouse when they outgrow their platform's built-in analytics (typically around $1M+ ARR or when managing multiple channels), want to combine ad spend with order data for true ROAS, or need SQL-level access to run custom cohort and attribution analyses.
This skill guides you through extracting data from your platform, loading it into BigQuery (recommended for cost), and transforming it with dbt so your team can build dashboards in Looker Studio, Metabase, or Tableau.
For most ecommerce merchants, BigQuery is the best starting point:
Alternatives:
Set up BigQuery:
ecommerce_raw) for raw ingested data and a second dataset (ecommerce_analytics) for transformed tablesThe easiest way to get platform data into BigQuery is a managed connector. Avoid building custom extractors unless you have a specific requirement.
Option A: Fivetran (recommended, $$$)
dbt-fivetran/shopify — provides staging and mart models out of the boxOption B: Stitch (mid-price)
Option C: Shopify's native data export (free, manual)
Option D: Polar Analytics (simplest, all-in-one)
Option A: Stitch or Airbyte
Option B: Direct MySQL replication WooCommerce stores all data in MySQL. For a technical team:
Option C: Metorik export
Option A: Fivetran BigCommerce connector
Option B: BigCommerce Data Solutions
Option C: Stitch BigCommerce connector
dbt (data build tool) transforms raw ingested data into clean analytics-ready tables using SQL. It handles dependency management, testing, and documentation.
Install dbt:
pip install dbt-bigquery # for BigQuery
# or: pip install dbt-snowflake / dbt-redshift
Initialize a project:
dbt init ecommerce_analytics
cd ecommerce_analytics
Configure your profile in ~/.dbt/profiles.yml (points to your BigQuery project).
If using Fivetran + Shopify, install the pre-built Shopify dbt package:
# packages.yml
packages:
- package: fivetran/shopify
version: [">=0.10.0", "<0.11.0"]
Run dbt deps to install, then dbt run to build all models.
For other platforms or custom schemas, build models in this three-layer pattern:
Staging layer — clean raw data, no business logic:
-- models/staging/stg_orders.sql
with source as (select * from {{ source('shopify_raw', 'order') }}),
renamed as (
select
id::varchar as order_id,
name as order_number,
customer_id::varchar as customer_id,
email,
financial_status,
fulfillment_status,
total_price::numeric(12,2) as total_price,
subtotal_price::numeric(12,2) as subtotal_price,
total_discounts::numeric(10,2) as total_discounts,
currency,
source_name as channel,
created_at as ordered_at
from source
where _fivetran_deleted = false
)
select * from renamed
Mart layer — business-ready tables for dashboards:
-- models/marts/fct_orders.sql
with orders as (select * from {{ ref('stg_orders') }}),
customers as (select * from {{ ref('stg_customers') }}),
first_orders as (
select customer_id, min(ordered_at) as first_order_at
from orders group by 1
)
select
o.order_id,
o.order_number,
o.customer_id,
o.ordered_at,
o.total_price,
o.subtotal_price,
o.total_discounts,
o.channel,
o.ordered_at = fo.first_order_at as is_first_order,
date_trunc(o.ordered_at, month) as order_month
from orders o
left join first_orders fo on o.customer_id = fo.customer_id
where o.financial_status not in ('voided', 'pending')
Add tests in your schema YAML files so broken pipelines are caught before they reach dashboards:
# models/marts/schema.yml
models:
- name: fct_orders
columns:
- name: order_id
tests: [unique, not_null]
- name: total_price
tests:
- not_null
- dbt_utils.accepted_range:
min_value: 0
Run tests: dbt test
Once data is in BigQuery, connect a visualization layer:
| Problem | Solution |
|---------|----------|
| Revenue in warehouse does not match platform analytics | Verify currency handling (cents vs. dollars), discount application order, and tax inclusion/exclusion; build a daily reconciliation check that compares warehouse totals to platform API totals |
| Historical product prices not captured | Implement SCD Type 2 on the product dimension and join fact tables to the product row that was current at the time of the order |
| ETL fails mid-run leaving partial data | Use dbt's on_schema_change: sync_all_columns and atomic table swaps; for incremental models, configure a unique_key to handle re-runs idempotently |
| Dashboard queries are too slow | Pre-aggregate in daily summary fact tables; partition large tables by date in BigQuery; avoid running heavy SQL on load |
| Customer identity not resolved across channels | Build a customer identity resolution table that maps email, phone, and platform-specific IDs to a single canonical customer key |
tools
Let shoppers save products to a wishlist, share it with friends, and get notified when saved items come back in stock or drop in price
development
Build a themeable storefront with design tokens and CSS custom properties that supports white-labeling, multi-brand variants, and dark mode
development
Speed up product discovery with instant search suggestions, fuzzy typo matching, and category-aware results powered by Algolia or Elasticsearch
development
Build a mobile-first storefront with thumb-friendly navigation, sticky add-to-cart buttons, and touch-optimized components for high mobile conversion