skills/dbt-analytics-engineer/SKILL.md
dbt Core/Cloud data transformations, testing, documentation, and CI/CD. Activate on: dbt, data transformation, analytics engineering, ref, source, staging model, mart, dbt test. NOT for: orchestration/scheduling (use airflow-dag-orchestrator), data warehouse tuning (use data-warehouse-optimizer).
npx skillsauth add curiositech/windags-skills dbt-analytics-engineerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Build, test, and document data transformations using dbt Core/Cloud with modern analytics engineering practices.
Model Size & Query Pattern → Materialization Choice
├── < 1M rows, rarely queried
│ └── VIEW (ephemeral if only intermediate)
├── 1M-10M rows, daily queries
│ └── TABLE (full refresh nightly)
├── > 10M rows, frequent queries
│ ├── Append-only data → INCREMENTAL (append strategy)
│ ├── Updates/deletes → INCREMENTAL (merge strategy)
│ └── Complex joins/aggregations → TABLE with incremental source prep
└── Dev/staging environment
└── Always VIEW (cost optimization)
Data Characteristics → Layer Placement
├── Raw source mapping (1:1)
│ └── staging/ (stg_ prefix, light cleaning only)
├── Business logic, joins, calculations
│ └── intermediate/ (int_ prefix, reusable components)
├── Final consumption ready
│ ├── Analytics/BI → marts/ (fct_, dim_ prefixes)
│ └── ML features → features/ (fea_ prefix)
└── One-off analysis
└── analysis/ (not materialized)
Model Criticality & Data Patterns → Test Coverage
├── Core business metrics (revenue, customers)
│ └── COMPREHENSIVE: unique, not_null, relationships, custom business rules
├── Supporting dimensions
│ └── STANDARD: unique, not_null, accepted_values
├── Intermediate models
│ └── MINIMAL: not_null on join keys, row count > 0
└── Development models
└── BASIC: not_null on primary key only
Detection Rule: dbt compile fails with "Cycle detected" error
Symptoms: Model A refs Model B, which refs Model C, which refs Model A
Fix: Break cycle by moving shared logic to new intermediate model that both reference
Detection Rule: dbt run succeeds but row counts decrease unexpectedly on incremental models
Symptoms: Late-arriving data missed, duplicates created, or filter logic excludes existing records
Fix: dbt run --full-refresh to rebuild, then fix filter conditions and unique_key configuration
Detection Rule: dbt test takes >30min or times out on warehouse
Symptoms: Tests query entire fact tables without limits, complex join tests on unindexed columns
Fix: Add limit: 100000 to expensive tests, use dbt test --select config.severity:error for CI
Detection Rule: >50% of models/columns lack descriptions in dbt docs generate output
Symptoms: New team members can't understand model purpose, business users avoid self-service
Fix: Require description in CI checks, template model YAML generation, quarterly doc reviews
Detection Rule: Macros calling other macros >3 levels deep, single macro >100 lines Symptoms: Impossible to debug Jinja errors, changes break unexpected downstream models Fix: Flatten macro hierarchies, split complex macros, add macro unit tests with dbt-unit-testing
Scenario: E-commerce orders data from Shopify API to revenue analytics
Step 1: Source Configuration
# models/staging/shopify/_sources.yml
sources:
- name: shopify_raw
freshness:
warn_after: {count: 6, period: hour}
tables:
- name: orders
description: "Raw orders from Shopify API"
Step 2: Staging Model (Decision: VIEW for <1M rows)
-- models/staging/shopify/stg_shopify__orders.sql
SELECT
order_id::varchar as order_id,
customer_id::varchar as customer_id,
order_date::date as order_date,
total_amount::decimal(10,2) as total_amount,
status::varchar as status,
_loaded_at::timestamp as _loaded_at
FROM {{ source('shopify_raw', 'orders') }}
WHERE status != 'cancelled' -- Business rule: exclude cancelled
Step 3: Mart Model (Decision: INCREMENTAL for >10M rows, daily queries)
-- models/marts/finance/fct_revenue.sql
{{
config(
materialized='incremental',
unique_key='order_id',
incremental_strategy='merge' -- Handle late updates
)
}}
SELECT
order_id,
customer_id,
order_date,
total_amount,
status,
_loaded_at
FROM {{ ref('stg_shopify__orders') }}
{% if is_incremental() %}
-- Only process recent data
WHERE _loaded_at > (SELECT max(_loaded_at) FROM {{ this }})
{% endif %}
Trade-off Analysis Made:
What novice misses: Using append strategy for updatable data, forgetting late-arriving data filters
stg_, int_, fct_, dim_ prefixesunique and not_null testsunique_key and appropriate strategy (merge/append){{ ref() }} and {{ source() }} calls resolve without hardcoded table namesdbt test passes with zero failures on all error-level testsdbt docs generate produces complete lineage graph with no broken referencesdbt build --select state:modified+ and completes in <15 minutesDo NOT use this skill for:
airflow-dag-orchestrator insteadstreaming-pipeline-architect insteaddata-warehouse-optimizer insteaddata-governance-steward insteadrealtime-analytics-architect insteadDelegation Rules:
data-warehouse-optimizerdata-quality-guardiandimensional-modelertools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.