skills/dbt/refactoring-dbt-models/SKILL.md
Safely refactors dbt models with downstream impact analysis. Use when restructuring dbt models for: (1) Task mentions "refactor", "restructure", "extract", "split", "break into", or "reorganize" (2) Extracting CTEs to intermediate models or creating macros (3) Modifying model logic that has downstream consumers (4) Renaming columns, changing types, or reorganizing model dependencies Analyzes all downstream dependencies BEFORE making changes.
npx skillsauth add altimateai/data-engineering-skills refactoring-dbt-modelsInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Find ALL downstream dependencies before changing. Refactor in small steps. Verify output after each change.
cat models/<path>/<model_name>.sql
Identify refactoring opportunities:
CRITICAL: Never refactor without knowing impact.
# Get full dependency tree (model and all its children)
dbt ls --select model_name+ --output list
# Find all models referencing this one
grep -r "ref('model_name')" models/ --include="*.sql"
Report to user: "Found X downstream models: [list]. These will be affected by changes."
BEFORE changing any columns, check what downstream models reference:
# For each downstream model, check what columns it uses
cat models/<path>/<downstream_model>.sql | grep -E "model_name\.\w+|alias\.\w+"
If downstream models reference specific columns, you MUST ensure those columns remain available after refactoring.
| Opportunity | Strategy |
|-------------|----------|
| Long CTE | Extract to intermediate model |
| Repeated logic | Create macro in macros/ |
| Complex join | Split into intermediate models |
| Multiple concerns | Separate into focused models |
Before:
-- orders.sql (200 lines)
with customer_metrics as (
-- 50 lines of complex logic
),
order_enriched as (
select ...
from orders
join customer_metrics on ...
)
select * from order_enriched
After:
-- customer_metrics.sql (new file)
select
customer_id,
-- complex logic here
from {{ ref('customers') }}
-- orders.sql (simplified)
with order_enriched as (
select ...
from {{ ref('raw_orders') }} orders
join {{ ref('customer_metrics') }} cm on ...
)
select * from order_enriched
Before (repeated in multiple models):
case
when amount < 0 then 'refund'
when amount = 0 then 'zero'
else 'positive'
end as amount_category
After:
-- macros/categorize_amount.sql
{% macro categorize_amount(column_name) %}
case
when {{ column_name }} < 0 then 'refund'
when {{ column_name }} = 0 then 'zero'
else 'positive'
end
{% endmacro %}
-- In models:
{{ categorize_amount('amount') }} as amount_category
# Compile to check syntax
dbt compile --select +model_name+
# Build entire lineage
dbt build --select +model_name+
# Check row counts (manual)
# Before: Record expected counts
# After: Verify counts match
CRITICAL: Refactoring should not change output.
# Compare row counts before and after
dbt show --inline "select count(*) from {{ ref('model_name') }}"
# Spot check key values
dbt show --select <model_name> --limit 10
If changing output columns:
| Symptom | Refactoring | |---------|-------------| | Model > 200 lines | Extract CTEs to models | | Same logic in 3+ models | Extract to macro | | 5+ joins in one model | Create intermediate models | | Hard to understand | Add CTEs with clear names | | Slow performance | Split to allow parallelization |
tools
Delegates data engineering tasks to altimate-code, a specialized CLI agent with 100+ purpose-built data tools — SQL analysis, column-level lineage, dbt build/test/run, warehouse profiling, FinOps, and connectivity to Snowflake, BigQuery, Redshift, Databricks, Postgres, MySQL, DuckDB. Use this skill when the task needs live warehouse access, column lineage, multi-step data exploration, dbt builds against a real warehouse, or when the user explicitly invokes "altimate", "altimate-code", or "the data agent".
testing
Optimizes Snowflake SQL query performance from provided query text. Use when optimizing Snowflake SQL for: (1) User provides or pastes a SQL query and asks to optimize, tune, or improve it (2) Task mentions "slow query", "make faster", "improve performance", "optimize SQL", or "query tuning" (3) Reviewing SQL for performance anti-patterns (function on filter column, implicit joins, etc.) (4) User asks why a query is slow or how to speed it up
development
Optimizes Snowflake query performance using query ID from history. Use when optimizing Snowflake queries for: (1) User provides a Snowflake query_id (UUID format) to analyze or optimize (2) Task mentions "slow query", "optimize", "query history", or "query profile" with a query ID (3) Analyzing query performance metrics - bytes scanned, spillage, partition pruning (4) User references a previously run query that needs optimization Fetches query profile, identifies bottlenecks, returns optimized SQL with expected improvements.
data-ai
Finds and ranks expensive Snowflake queries by cost, time, or data scanned. Use when: (1) User asks to find slow, expensive, or problematic queries (2) Task mentions "query history", "top queries", "most expensive", or "slowest queries" (3) Analyzing warehouse costs or identifying optimization candidates (4) Finding queries that scan the most data or have the most spillage Returns ranked list of queries with metrics and optimization recommendations.