library/specializations/data-engineering-analytics/skills/sql-query-optimizer/SKILL.md
Analyzes and optimizes SQL queries across different data warehouse platforms (Snowflake, BigQuery, Redshift, Databricks) with platform-specific recommendations.
npx skillsauth add a5c-ai/babysitter sql-query-optimizerInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
Analyzes and optimizes SQL queries across different data warehouse platforms with platform-specific recommendations.
This skill examines SQL queries to identify performance bottlenecks, suggest optimizations, and provide platform-specific recommendations for Snowflake, BigQuery, Redshift, and Databricks. It analyzes query execution plans, recommends indexes/clustering keys, and identifies anti-patterns.
{
"query": {
"type": "string",
"description": "The SQL query to analyze",
"required": true
},
"platform": {
"type": "string",
"enum": ["snowflake", "bigquery", "redshift", "databricks", "postgres"],
"required": true,
"description": "Target data warehouse platform"
},
"tableStatistics": {
"type": "object",
"description": "Table statistics including row counts, column cardinality",
"properties": {
"tables": {
"type": "array",
"items": {
"name": "string",
"rowCount": "number",
"sizeGB": "number",
"columns": "array"
}
}
}
},
"executionPlan": {
"type": "object",
"description": "Query execution plan (EXPLAIN output)"
},
"queryHistory": {
"type": "object",
"description": "Historical query performance metrics"
},
"optimizationGoals": {
"type": "array",
"items": {
"type": "string",
"enum": ["latency", "cost", "throughput", "scan_reduction"]
},
"default": ["latency", "cost"]
}
}
{
"optimizedQuery": {
"type": "string",
"description": "The optimized SQL query"
},
"improvements": {
"type": "array",
"items": {
"type": {
"type": "string",
"enum": ["join", "predicate", "aggregation", "cte", "window", "scan", "index"]
},
"description": "string",
"impact": "high|medium|low",
"lineNumber": "number",
"originalCode": "string",
"optimizedCode": "string"
}
},
"indexRecommendations": {
"type": "array",
"items": {
"table": "string",
"type": "clustering|sort|partition|index",
"columns": "array",
"rationale": "string",
"ddl": "string"
}
},
"estimatedImprovement": {
"scanReduction": {
"type": "number",
"description": "Percentage reduction in data scanned"
},
"timeReduction": {
"type": "number",
"description": "Percentage reduction in execution time"
},
"costReduction": {
"type": "number",
"description": "Percentage reduction in query cost"
}
},
"antiPatterns": {
"type": "array",
"items": {
"pattern": "string",
"severity": "high|medium|low",
"location": "string",
"suggestion": "string"
}
},
"platformSpecificNotes": {
"type": "array",
"items": "string"
}
}
{
"query": "SELECT * FROM orders o JOIN customers c ON o.customer_id = c.id WHERE o.created_at > '2024-01-01'",
"platform": "snowflake"
}
{
"query": "SELECT customer_id, SUM(amount) FROM orders GROUP BY customer_id",
"platform": "bigquery",
"executionPlan": {
"stages": [...],
"totalBytesProcessed": 1073741824
},
"optimizationGoals": ["cost", "scan_reduction"]
}
{
"query": "SELECT ... complex query ...",
"platform": "redshift",
"tableStatistics": {
"tables": [
{
"name": "orders",
"rowCount": 10000000,
"sizeGB": 50,
"columns": [
{"name": "order_id", "cardinality": 10000000},
{"name": "customer_id", "cardinality": 500000}
]
}
]
}
}
| Optimization | Description | |--------------|-------------| | Clustering keys | Recommend micro-partition clustering | | Result cache | Identify queries benefiting from caching | | Query acceleration | Suggest QUERY_ACCELERATION_MAX_SCALE_FACTOR | | Warehouse sizing | Right-size warehouse recommendations |
| Optimization | Description | |--------------|-------------| | Partitioning | DATE/TIMESTAMP partitioning recommendations | | Clustering | Up to 4 clustering columns | | BI Engine | Identify BI Engine-eligible queries | | Slots | Estimate slot usage optimization |
| Optimization | Description | |--------------|-------------| | Sort keys | COMPOUND vs INTERLEAVED recommendations | | Distribution | KEY, EVEN, ALL distribution strategies | | Compression | Column encoding recommendations | | Vacuum | VACUUM and ANALYZE recommendations |
| Optimization | Description | |--------------|-------------| | Z-ordering | Multi-column Z-order recommendations | | Delta cache | Caching strategy recommendations | | Photon | Photon-eligible query patterns | | Adaptive execution | AQE configuration suggestions |
| Anti-Pattern | Impact | Fix | |--------------|--------|-----| | SELECT * | High | Specify columns explicitly | | Correlated subqueries | High | Convert to JOIN or CTE | | DISTINCT on large datasets | Medium | Use GROUP BY or window functions | | Non-SARGable predicates | High | Rewrite for index usage |
| Anti-Pattern | Impact | Fix | |--------------|--------|-----| | Cartesian products | Critical | Add join conditions | | Implicit joins | Medium | Use explicit JOIN syntax | | Wrong join order | High | Reorder by selectivity | | Missing indexes on join keys | High | Add clustering/sort keys |
| Anti-Pattern | Impact | Fix | |--------------|--------|-----| | GROUP BY ordinal | Low | Use column names | | Aggregating before filter | High | Filter first, then aggregate | | Over-grouping | Medium | Reduce GROUP BY columns |
query-optimization.js)data-warehouse-setup.js)bi-dashboard.js)obt-creation.js)development
Model documentation skill for generating model cards following Google's model card framework.
development
MLflow integration skill for experiment tracking, model registry, and artifact management. Enables LLMs to log experiments, compare runs, manage model lifecycle, and retrieve artifacts through the MLflow API.
data-ai
LIME-based local explanation skill for individual predictions across tabular, text, and image data.
devops
Kubeflow Pipelines skill for ML workflow orchestration, component management, and Kubernetes-native ML.