skills/data-warehouse-optimizer/SKILL.md
Snowflake, BigQuery, clustering, partitioning, and materialized views for warehouse performance. Activate on: Snowflake, BigQuery, Redshift, query optimization, clustering, partitioning, materialized view, warehouse cost, query profile. NOT for: dbt model structure (use dbt-analytics-engineer), data modeling (use dimensional-modeler).
npx skillsauth add curiositech/windags-skills data-warehouse-optimizerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Optimize query performance and resource utilization in Snowflake, BigQuery, and Redshift through clustering, partitioning, materialized views, and query profiling.
Activate on: "Snowflake optimization", "BigQuery performance", "Redshift tuning", "query optimization", "clustering key", "partitioning", "materialized view", "warehouse sizing", "query profile", "slow query"
NOT for: dbt project structure → dbt-analytics-engineer | Dimensional modeling → dimensional-modeler | Cost optimization beyond warehouse → data-cost-optimizer
| Domain | Technologies | |--------|-------------| | Snowflake | Micro-partitions, clustering keys, search optimization, warehouses | | BigQuery | Partitioning, clustering, BI Engine, materialized views | | Redshift | Sort keys, dist keys, VACUUM, WLM, Redshift Serverless | | General | Query plans, statistics, result caching, spill-to-disk analysis | | Monitoring | Snowflake Account Usage, BigQuery INFORMATION_SCHEMA, CloudWatch |
-- Cluster a large fact table by commonly filtered columns
ALTER TABLE fct_events
CLUSTER BY (event_date, customer_id);
-- Verify clustering depth (lower = better, target < 2.0)
SELECT SYSTEM$CLUSTERING_INFORMATION('fct_events', '(event_date, customer_id)');
-- Search optimization for point lookups on high-cardinality columns
ALTER TABLE fct_events ADD SEARCH OPTIMIZATION
ON EQUALITY(order_id), EQUALITY(email);
-- Result: range scans use clustering, point lookups use search optimization
-- Partition by date, cluster by high-cardinality filter columns
CREATE TABLE `project.dataset.fct_events`
PARTITION BY DATE(event_timestamp)
CLUSTER BY customer_id, event_type
AS
SELECT * FROM `project.dataset.raw_events`;
-- Query benefits: partition pruning + cluster pruning
-- Only scans partitions matching WHERE clause
SELECT customer_id, COUNT(*)
FROM `project.dataset.fct_events`
WHERE event_timestamp BETWEEN '2026-01-01' AND '2026-01-31'
AND event_type = 'purchase'
GROUP BY customer_id;
-- Check bytes scanned reduction
-- Target: 90%+ reduction vs unpartitioned table
Workload Type Recommended Size Auto-Suspend Concurrency
───────────── ──────────────── ──────────── ───────────
Dashboard queries X-Small/Small 60s Auto-scale (max 3)
Analyst ad-hoc Medium 300s 1 cluster
dbt daily build Large Immediate 1 cluster
Data science / ML X-Large+ Immediate 1 cluster
Key: separate workloads into different warehouses
to prevent resource contention and enable per-workload billing
tools
Building resilient distributed systems with circuit breakers, retries with full-jitter exponential backoff, retry budgets (per-request 3-attempt + per-client 10% ratio per Google SRE), deadline propagation, and the cascading-failure math (4 layers × 3 retries = 64x amplification). Grounded in Resilience4j, Microsoft Cloud Patterns, AWS Architecture Blog (Marc Brooker), and Google SRE Book.
testing
Designing HTTP cache headers that work correctly across browsers, CDNs, and shared proxies — `Cache-Control` directives per RFC 9111, `stale-while-revalidate` and `stale-if-error` per RFC 5861, the Vary header for varying responses, and surrogate keys for tag-based purging. Grounded in IETF RFCs and Cloudflare/Fastly docs.
development
Use when designing or fixing a Content Security Policy on a real site, choosing between nonce-based and hash-based CSP, adding strict-dynamic, debugging "Refused to execute inline script" errors, deploying CSP in report-only mode first, configuring report-to / report-uri, or auditing an existing policy for unsafe-inline / unsafe-eval / wildcards. Triggers: "CSP blocks legitimate inline script", strict-dynamic, nonce-{RANDOM}, sha256-{HASH}, object-src none, base-uri none, frame-ancestors, Trusted Types, X-Content-Security-Policy obsolete, report-only vs enforced. NOT for general HTTP security headers (HSTS, COOP/COEP), Trusted Types deep dive, CORS configuration, or building a WAF.
tools
Choosing and operating an HTTP API versioning strategy that doesn't break clients — Stripe's date-based pinned versions, the Deprecation/Sunset header pair (RFC 9745 + RFC 8594), URI vs header vs media-type approaches, and the version-transformer pattern. Grounded in Stripe's published architecture and IETF RFCs.