skills/snowflake/optimizing-query-text/SKILL.md
Optimizes Snowflake SQL query performance from provided query text. Use when optimizing Snowflake SQL for: (1) User provides or pastes a SQL query and asks to optimize, tune, or improve it (2) Task mentions "slow query", "make faster", "improve performance", "optimize SQL", or "query tuning" (3) Reviewing SQL for performance anti-patterns (function on filter column, implicit joins, etc.) (4) User asks why a query is slow or how to speed it up
npx skillsauth add altimateai/data-engineering-skills optimizing-query-textInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Return ONLY the optimized SQL query. No markdown formatting, no explanations, no bullet points - just pure SQL that can be executed directly in Snowflake.
The optimized query MUST return IDENTICAL results to the original.
Before returning ANY optimization, verify:
ORDER BY exactly as writtenLIMIT N, keep LIMIT N. If no LIMIT, do NOT add one.If you cannot guarantee identical results, return the original query unchanged.
Problem: Functions on columns in WHERE clause prevent partition pruning and index usage.
| Original | Optimized | Why Safe |
|----------|-----------|----------|
| WHERE DATE(ts) = '2024-01-01' | WHERE ts >= '2024-01-01' AND ts < '2024-01-02' | Equivalent range |
| WHERE YEAR(dt) = 2024 | WHERE dt >= '2024-01-01' AND dt < '2025-01-01' | Equivalent range |
| WHERE MONTH(dt) = 3 AND YEAR(dt) = 2024 | WHERE dt >= '2024-03-01' AND dt < '2024-04-01' | Equivalent range |
| WHERE DATE(ts) >= '2024-01-01' AND DATE(ts) < '2024-02-01' | WHERE ts >= '2024-01-01' AND ts < '2024-02-01' | Same boundaries |
| WHERE YEAR(dt) BETWEEN 1995 AND 1996 | WHERE dt >= '1995-01-01' AND dt < '1997-01-01' | Equivalent range |
| Pattern | Why Not |
|---------|---------|
| WHERE YEAR(dt) IN (SELECT year FROM ...) | Dynamic values, cannot precompute range |
| WHERE DATE(ts) = DATE(other_col) | Comparing two columns, both need function |
| WHERE EXTRACT(DOW FROM dt) = 1 | Day-of-week has no contiguous range |
| WHERE DATE_TRUNC('month', dt) = '2024-01-01' in GROUP BY | Needed for grouping logic |
| SELECT YEAR(dt) AS yr ... GROUP BY YEAR(dt) | Function in SELECT/GROUP BY is fine, only filter matters |
Problem: Functions on JOIN columns prevent hash joins, forcing slower nested loop joins.
| Original | Optimized | Why Safe |
|----------|-----------|----------|
| ON CAST(a.id AS VARCHAR) = CAST(b.id AS VARCHAR) | ON a.id = b.id | If both are same type (e.g., INTEGER) |
| ON UPPER(a.code) = UPPER(b.code) | ON a.code = b.code | If data is already consistently cased |
| ON TRIM(a.name) = TRIM(b.name) | ON a.name = b.name | If data has no leading/trailing spaces |
| Pattern | Why Not |
|---------|---------|
| ON CAST(a.id AS VARCHAR) = b.string_id | Types genuinely differ, CAST required |
| ON DATE(a.timestamp) = b.date_col | Different granularity, DATE() required |
| ON UPPER(a.code) = b.code | If b.code might have different case |
| ON a.id = b.id + 1 | Arithmetic transformation, cannot remove |
Problem: NOT IN has poor performance and unexpected NULL behavior.
| Original | Optimized | Why Safe |
|----------|-----------|----------|
| WHERE id NOT IN (SELECT id FROM t WHERE ...) | WHERE NOT EXISTS (SELECT 1 FROM t WHERE t.id = main.id AND ...) | Equivalent when subquery column is NOT NULL |
| WHERE id NOT IN (SELECT id FROM t) where id has NOT NULL constraint | WHERE NOT EXISTS (SELECT 1 FROM t WHERE t.id = main.id) | NOT NULL guarantees equivalence |
| Pattern | Why Not |
|---------|---------|
| WHERE id NOT IN (SELECT nullable_col FROM t) | If subquery returns NULL, NOT IN returns no rows; NOT EXISTS doesn't |
| WHERE (a, b) NOT IN (SELECT x, y FROM t) | Multi-column NOT IN has complex NULL semantics |
Key Rule: Only convert NOT IN to NOT EXISTS if you can verify the subquery column cannot be NULL.
Problem: Same subquery executed multiple times causes redundant scans.
| Original | Optimized | |----------|-----------| | Subquery appears 2+ times identically | Extract to CTE, reference CTE multiple times | | Same aggregation used in multiple places | Compute once in CTE |
| Pattern | Why Not | |---------|---------| | Correlated subquery (references outer table) | Each execution is different, cannot cache | | Subqueries with different filters | Not actually the same subquery | | Subquery in SELECT that depends on current row | Correlation prevents extraction |
Problem: Comma-separated tables in FROM clause are harder to read and optimize.
Convert FROM a, b, c WHERE a.id = b.id AND b.id = c.id to explicit JOIN syntax.
This is always safe - just restructuring, no semantic change.
SUM(SUM(x)) OVER(...) or similar nested aggregatestools
Delegates data engineering tasks to altimate-code, a specialized CLI agent with 100+ purpose-built data tools — SQL analysis, column-level lineage, dbt build/test/run, warehouse profiling, FinOps, and connectivity to Snowflake, BigQuery, Redshift, Databricks, Postgres, MySQL, DuckDB. Use this skill when the task needs live warehouse access, column lineage, multi-step data exploration, dbt builds against a real warehouse, or when the user explicitly invokes "altimate", "altimate-code", or "the data agent".
development
Optimizes Snowflake query performance using query ID from history. Use when optimizing Snowflake queries for: (1) User provides a Snowflake query_id (UUID format) to analyze or optimize (2) Task mentions "slow query", "optimize", "query history", or "query profile" with a query ID (3) Analyzing query performance metrics - bytes scanned, spillage, partition pruning (4) User references a previously run query that needs optimization Fetches query profile, identifies bottlenecks, returns optimized SQL with expected improvements.
data-ai
Finds and ranks expensive Snowflake queries by cost, time, or data scanned. Use when: (1) User asks to find slow, expensive, or problematic queries (2) Task mentions "query history", "top queries", "most expensive", or "slowest queries" (3) Analyzing warehouse costs or identifying optimization candidates (4) Finding queries that scan the most data or have the most spillage Returns ranked list of queries with metrics and optimization recommendations.
development
Adds schema tests and data quality validation to dbt models. Use when working with dbt tests for: (1) Adding or modifying tests in schema.yml files (2) Task mentions "test", "validate", "data quality", "unique", "not_null", or "accepted_values" (3) Ensuring data integrity - primary keys, foreign keys, relationships (4) Debugging test failures or understanding why dbt test failed Matches existing project test patterns and YAML style before adding new tests.