skills/dbt/documenting-dbt-models/SKILL.md
Documents dbt models and columns in schema.yml. Use when working with dbt documentation for: (1) Adding model descriptions or column definitions to schema.yml (2) Task mentions "document", "describe", "description", "dbt docs", or "schema.yml" (3) Explaining business context, grain, meaning of data, or business rules (4) Preparing dbt docs generate or improving model discoverability Matches existing project documentation style and conventions before writing.
npx skillsauth add altimateai/data-engineering-skills documenting-dbt-modelsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Document the WHY, not just the WHAT. Include grain, business rules, and caveats.
CRITICAL: Match the project's documentation style before adding new docs.
# Find all schema.yml files with documentation
find . -name "schema.yml" | head -5
# Read well-documented models to learn patterns
cat models/marts/schema.yml | head -150
cat models/staging/schema.yml | head -150
Extract from existing documentation:
cat models/<path>/<model_name>.sql
Understand: transformations, business logic, joins, filters.
# Find existing schema.yml
find . -name "schema.yml" -exec grep -l "<model_name>" {} \;
# Read existing docs
cat models/<path>/schema.yml | grep -A 100 "<model_name>"
For each model, document:
For each column, consider:
Match the style discovered in step 1. Example format (adapt to project):
version: 2
models:
- name: orders
description: |
Order transactions at the order line item grain.
Each row represents one product in one order.
**Business Rules:**
- Revenue recognized on ship_date, not order_date
- Cancelled orders excluded (status != 'cancelled')
- Returns processed as negative line items
**Grain:** One row per order_id + product_id combination
columns:
- name: order_id
description: |
Unique identifier for the order.
Source: orders.id from Stripe webhook
- name: customer_id
description: |
Foreign key to customers table.
NULL for guest checkouts (pre-2023 only)
- name: revenue
description: |
Net revenue for this line item in USD.
Calculation: unit_price * quantity - discount_amount
Excludes tax and shipping
- name: order_status
description: |
Current status of the order.
Values: pending, processing, shipped, delivered, cancelled, returned
dbt docs generate
dbt docs serve # Optional: preview locally
Note: These are default templates. Always adapt to match project's existing style.
description: |
[One sentence: what this model contains]
**Grain:** [What does one row represent?]
**Business Rules:**
- [Key rule 1]
- [Key rule 2]
**Caveats:**
- [Important limitation or edge case]
| Column Type | Documentation Focus | |-------------|---------------------| | Primary key | Source system, uniqueness guarantee | | Foreign key | What it joins to, NULL handling | | Metric | Calculation formula, units, exclusions | | Date | Timezone, what event it represents | | Status/Category | All possible values, business meaning | | Boolean/Flag | What true/false means in business terms |
- name: gross_margin
description: |
Gross margin percentage.
Calculation: (revenue - cogs) / revenue * 100
NULL when revenue = 0 to avoid division by zero
tools
Delegates data engineering tasks to altimate-code, a specialized CLI agent with 100+ purpose-built data tools — SQL analysis, column-level lineage, dbt build/test/run, warehouse profiling, FinOps, and connectivity to Snowflake, BigQuery, Redshift, Databricks, Postgres, MySQL, DuckDB. Use this skill when the task needs live warehouse access, column lineage, multi-step data exploration, dbt builds against a real warehouse, or when the user explicitly invokes "altimate", "altimate-code", or "the data agent".
testing
Optimizes Snowflake SQL query performance from provided query text. Use when optimizing Snowflake SQL for: (1) User provides or pastes a SQL query and asks to optimize, tune, or improve it (2) Task mentions "slow query", "make faster", "improve performance", "optimize SQL", or "query tuning" (3) Reviewing SQL for performance anti-patterns (function on filter column, implicit joins, etc.) (4) User asks why a query is slow or how to speed it up
development
Optimizes Snowflake query performance using query ID from history. Use when optimizing Snowflake queries for: (1) User provides a Snowflake query_id (UUID format) to analyze or optimize (2) Task mentions "slow query", "optimize", "query history", or "query profile" with a query ID (3) Analyzing query performance metrics - bytes scanned, spillage, partition pruning (4) User references a previously run query that needs optimization Fetches query profile, identifies bottlenecks, returns optimized SQL with expected improvements.
data-ai
Finds and ranks expensive Snowflake queries by cost, time, or data scanned. Use when: (1) User asks to find slow, expensive, or problematic queries (2) Task mentions "query history", "top queries", "most expensive", or "slowest queries" (3) Analyzing warehouse costs or identifying optimization candidates (4) Finding queries that scan the most data or have the most spillage Returns ranked list of queries with metrics and optimization recommendations.